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The  U.  S.  National  Health  Survey  is  a continuing  program  under  which 
the  Public  Health  Service  makes  studies  to  determine  the  extent  of  ill- 
ness and  disability  in  the  population  of  the  United  States  and  to  gather 
related  information.  It  is  authorized  by  Public  Law  652,  84th  Congress. 


CO-OPERATION  OF  THE  BUREAU  OF  THE  CENSUS 


Under  the  legislation  establishing  the  National  Health  Survey,  the 
Public  Health  Service  is  authorized  to  use,  insofar  as  possible,  the 
services  or  facilities  of  other  Federal,  State,  or  private  agencies.  For 
the  national  household  survey  the  Bureau  of  the  Census  designed  and 
selected  the  sample,  conducted  the  household  interviews,  and  processed 
the  data  in  accordance  with  specifications  established  by  the  Public 
Health  Service. 
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PREFACE 


This  report  presents  a description  of  the  initial  statistical 
design  of  the  continuing  Health  Household- Interview  Survey, 
which  is  a major  phase  of  the  program  of  the  U.  S.  National 
Health  Survey.  The  design  described  in  this  report  is  that  used 
during  the  period,  July-December  1957,  which,  with  minor 
modifications,  will  be  used  throughout  1958.  Except  for  such 
modifications,  the  design  is,  therefore,  the  basis  of  the  sta- 
tistical reports  being  published  from  the  household  interviews 
conducted  during  this  period. 

General  requirements  for  the  survey  design  were  pre- 
pared by  the  Public  Health  Service  and  on  the  basis  of  these, 
the  theoretical  and  operating  plan  of  the  sample  was  prepared 
by  the  staff  of  the  Census  Bureau.  Although  there  are  some 
important  differences,  the  sample  plan  for  this  health  survey 
draws  heavily  from  designs  previously  developed  by  the  Bureau 
of  the  Census  for  its  Current  Population  Survey. 

In  addition  to  its  function  as  the  principal  designer  of  the 
survey  sample  plan,  the  Census  Bureau  also  conducts  the  field 
interviewing,  and  processes  the  data  in  accordance  with  spec- 
ifications provided  by  the  Public  Health  Service.  Tabulation  is 
handled  on  the  Census  Bureau's  electronic  computers.  Final 
tables  and  published  reports  are  planned  and  prepared  by  the 
Public  Health  Service. 

Principal  responsibility  for  development  of  the  statistical 
design  and  preparation  of  the  text  of  this  report  was  shared  by 
William  N.  Hurwitz,  Harold  Nisselson,  Walt  R.  Simmons, 
Joseph  Steinberg,  Joseph  Waksberg,  and  Theodore  D.  Woolsey. 
(Messrs.  Simmons  and  Woolsey  are  members  of  the  LJ.  S.  Na- 
tional Health  Survey  staff;  Messrs.  Hurwitz,  Nisselson,  Stein- 
berg, and  Waksberg  are  staff  members  of  the  Bureau  of  the 
Census.)  They  were  assisted  by  numerous  members  of  the 
Census  Bureau  staff,  including  especially  Katherine  G.  Capt, 
Robert  H.  Finch,  Jr.,  Mary  J.  Jaracz,  Garrie  J.  Losee,  and 
Helen  M.  Lucas. 
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STATISTICAL  DESIGN  OF  THE 

HEALTH  HOUSEHOLD-INTERVIEW  SURVEY 

1.  INTRODUCTION 


The  program  of  the  U.  S.  National  Health  Sur- 
vey is  a statistical  measurement  of  the  extent  of 
illness,  disability,  and  related  conditions  of  the 
population.  This  program  consists  of  several  dis- 
tinct but  related  parts.  One  of  these  is  the  collection 
of  data  on  health  through  a continuing  Health  House- 
hold-Interview Survey.  A second  main  part  of  the 
program  is  a series  of  surveys  which  utilize  pro- 
cedures other  than  household  interview  as  the 
source  of  data  on  health.  A third  phase  of  the  pro- 
gram evaluates  procedures  and  results  and  devel- 
ops improved  techniques  of  measurement. 

The  present  report  describes  the  statistical 
design  of  the  Health  Household-Interview  Survey. 
In  addition  to  setting  forth  the  pattern  of  the  Survey 
as  it  was  initiated  in  July  1957  and  as  it  functioned 
in  its  first  year  of  operation,  the  report  will  em- 
phasize two  further  points.  One  is  that  the  house- 
hold interviews,  while  independent  in  a statistical 
sense  of  other  surveys  in  the  program,  are  but  one 
very  important  component  of  the  broader  under- 
taking which  is  the  U.  S.  National  Health  Survey 


(NHS).  The  second  is  that  the  household  survey 
constitutes  an  evolutionary  program  which  may  be 
expected  to  change  as  experience  accumulates  and 
which,  at  any  given  time,  is  expected  to  fulfill  only 
those  objectives  of  the  National  Health  Survey  for 
which  it  is  the  most  appropriate  vehicle. 

Substantive  findings  from  the  household-inter- 
view survey  are  being  published  by  the  Public 
Health  Service  in  a sequence  of  numbered  docu- 
ments identified  as  Health  Statistics,  Series  B. 
Technical  reports  and  methodological  studies  are 
issued  in  Series  A,  and  include  this  report  on  sta- 
tistical design. 

Arrangement  of  material  in  the  present  re- 
port is  intended  to  facilitate  use  by  two  different 
groups  of  readers.  It  is  hoped  that  the  body  of  the 
report  will  be  of  interest  to  and  readily  readable 
by  all  professional  persons  concerned  with  health 
problems  and  those  interested  in  research  methods. 
Several  technical  appendices  have  been  added  for 
the  benefit  of  statisticians,  but  contain  material 
which  may  be  informative  for  a wider  audience. 


2.  BACKGROUND  AND  OBJECTIVES 


History 


A detailed  account  was  given  in  the  first  pub- 
lication in  this  Series1,  of  the  background,  need 
for,  purposes,  and  expected  product  of  the  U.  S. 
National  Health  Survey.  That  story  will  not  be  du- 
plicated here.  However,  it  may  be  helpful  to  recall 
very  briefly  a few  highlights  of  the  period  which 
preceded  initiation  of  the  operating  program  in  the 
middle  of  1957. 

By  1957  it  had  been  more  than  20  years  since 
the  last  major  survey  to  obtain  comprehensive  sta- 


tistics on  diseases,  injuries,  and  impairments  in 
the  general  population  of  the  United  States.  Carried 
out  in  1935-36,  that  survey  was  a major  project  in 
which  737,000  urban  households  were  visited  by 
interviewers  to  obtain  data  on  morbidity,  impair- 
ments, and  health  characteristics.  It  remains  a 
landmark  in  the  field. 

In  the  years  since  1936  there  have  been  a num- 
ber of  community  studies  of  morbidity,  prominent 
among  which  are  the  names  of  Hagerstown  and 
Baltimore,  Md.;  Pittsburgh,  Pa.;  Hunterdon  County. 
N.  J.;  Kansas  City,  Mo.;  New  York  City;  and  Cali- 
fornia (both  San  Jose  and  a statewide  study).  These 


studies,  as  well  as  occasional  experiments  with 
supplements  to  the  Census  Bureau’s  Current  Pop- 
ulation Survey,  demonstrated  that  the  interview 
method  is  capable  of  providing  useful  information 
about  the  amount  and  distribution  of  diseases  and 
injuries  together  with  related  information  such  as 
the  accompanying  loss  of  time  from  work  or  other 
usual  activities. 

In  January  1949,  the  U.  S.  National  Committee 
on  Vital  and  Health  Statistics  was  established.  Sub- 
committees were  established  in  December  1949  and 
October  1950  to  study  the  needs  for  current  mor- 
bidity statistics.  As  a result  of  their  recommenda- 
tions, a third  Subcommittee  was  established  in 
February  1951  under  the  chairmanship  of  Dr.  W. 
Thurber  Fales  of  Johns  Hopkins  University,  and 
instructed  to  draft  a "Plan  fora  national  morbidity 
survey  keeping  in  view  the  interests  of  local  areas." 
After  careful  study,  this  Subcommittee  recom- 
mended that  several  steps  be  taken,  and  in  partic- 
ular: "That  a continuing  national  morbidity  survey 
be  conducted  ....  Its  purpose  would  be  to  obtain 
data  on  the  prevalence  and  incidence  of  disease, 
injuries,  and  impairments,  on  the  nature  and  du- 
ration of  the  resulting  disability,  and  on  the  amount 
and  type  of  medical  care  received.  The  data  would 
be  obtained  from  a probability  sample  of  house- 
holds" (page  28  of  reference  1). 


Public  Law  652  and  NHS  Objectives 

In  the  summer  of  1955,  the  Department  of 
Health,  Education,  and  Welfare  proposed  legisla- 
tion for  a continuing  health  survey,  closely  paral- 
leling recommendations  of  the  Subcommittee.  The 
proposal  was  included  in  the  President's  recom- 
mendations on  health  matters,  received  bipartisan 
support  in  Congress,  was  enacted  into  Public  Law 
652,  84th  Congress,  and  was  signed  by  the  Presi- 
dent on  July  3,  1956.  Later  the  same  month  appro- 
priations were  made  available  for  planning  and 
pretesting  during  the  fiscal  year  ending  June  30, 
1957. 

The  law  authorizes  the  Surgeon  General  of  the 
Public  Health  Service  to  make  continuing  surveys 
and  special  studies  of  the  population  of  the  United 
States  to  determine  the  extent  of  illness  and  disa- 
bility and  related  information  such  as:  the  number, 
age,  sex,  ability  to  work  or  engage  in  other  activi- 
ties, and  occupation  or  activities  of  persons  af- 
flicted with  chronic  or  other -disease  or  injury  or 
handicapping  condition;  the  type  of  disease  or  in- 
jury or  handicapping  condition  of  each  person  so 
afflicted;  the  length  of  time  that  each  such  person 
has  been  prevented  from  carrying  on  his  occupa- 
tion or  activities;  the  amounts  and  types  of  serv- 
ices received  for  or  because  of  such  conditions; 
and  the  economic  and  other  impacts  of  such  con- 
ditions. 


A significant  feature  of  Public  Law  652  is  that 
it  not  only  provides  that  substantive  data  be  assem- 
bled, but  in  addition,  directs  the  Public  Health 
Service,  "to  develop  and  test  new  or  improved 
methods  for  obtaining  current  data  on  illness  and 
disability  and  related  information." 

Thus  legislative  intent  looks  to  the  establish- 
ment of  health  statistics  as  noted  in  the  law,  and 
foresees  ". . . continuing  surveys  . . . special  stud- 
ies . . . [and]  develop  [ing]  and  test  [ing]  new  and 
improved  methods"  as  the  objectives  of  the  U.  S. 
National  Health  Survey. 


Planning  and  Pretesting 
the  Household  Interviews 

Throughout  the  fiscal  year  ending  in  June  1957 
plans  were  developed  for  organizing  and  carrying 
out  the  household  survey  which  had  been  contem- 
plated by  the  Subcommittee  and  authorized  by  Con- 
gress. The  law  contained  the  provision  whereby  the 
program  could  secure  the  assistance  of  other  Fed- 
eral agencies,  as  well  as  private  persons  or  agen- 
cies, in  carrying  out  its  responsibilities.  Under 
this  provision,  the  NHS  made  arrangements  to  uti- 
lize the  very  extensive  resources  and  experience 
of  the  Bureau  of  the  Census  in  planning  and  con- 
ducting the  household -interview  survey. 

From  the  beginning,  it  was  clear  that  the  Na- 
tional Health  Survey  should  be  a general  multipur- 
pose undertaking,  rather  than  a study  with  some 
single  specific  limited  objective.  This  concept 
meant  that  presurvey  planning  was  particularly 
important.  It  required  a careful  review  of  previous 
efforts,  a weighing  and  evaluating  of  a large  number 
of  possible  alternatives,  so  that  the  new  survey 
might  be  sufficiently  comprehensive  to  cover  many 
of  the  desired  objectives,  while  at  the  same  time 
not  to  be  so  diluted  as  to  deal  inadequately  with  all 
topics. 

By  February  1957,  general  structure  of  the 
survey  had  been  determined,  samples  had  been 
drawn,  and  a tentative  questionnaire  and  field  in- 
structions had  been  drafted.  A pretest  of  1,200 
households  was  conducted  in  Charlotte,  N.  C.,  to 
provide  a complete  field  trial  of  procedure.  The 
pretest  was  used  also  for  training  field  supervisors 
for  the  national  program.  The  next  month  was  de- 
voted to  polishing  the  questionnaire  and  procedures, 
and  to  hiring  and  training  interviewers.  In  the  2 
months  of  May  and  June,  the  entire  nationwide  or- 
ganization went  through  a shakedown  and  training 
period  with  interviewing  and  editing  proceeding 
just  as  though  the  survey  were  in  operation.  Of- 
ficial collection  of  data  began  the  first  week  in 
July  1957. 
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3.  SUMMARY  OF  STRUCTURE  OF 
HEALTH  HOUSEHOLD-INTERVIEW  SURVEY 


Role  of  Interview  Survey 

As  noted  in  the  previous  section,  the  program 
of  the  U.  S.  National  Health  Survey  is  intended  to 
be  an  intensive  and  sustained  undertaking  to  pro- 
vide morbidity  and  health  statistics,  utilizing  what- 
ever resources  and  methods  are  appropriate  to  the 
task.  The  program  is  expected  further  to  evaluate 
existing  sources  and  methods  and  to  develop  new 
methodologies. 

Among  possible  sources  of  data  a prominent 
position  goes  to  medical  and  health  records.  These 
include  such  originating  places  as  hospitals,  physi- 
cians' and  dentists'  offices,  and  insurance  records 
of  several  kinds.  They  include,  too,  reporting  under 
governmental  regulation  of  certain  types  of  mor- 
bidity and  mortality,  and  especially  the  filing  of 
death  certificates. 

Another  potentially  significant  source  of  in- 
formation may  lie  in  samples  of  persons  who  are 
given  clinical  tests  and  measurements  or  general 
health  or  medical  examinations. 

All  these  sources,  and  others,  are  to  be  ex- 
plored by  the  NHS.  Several  pilot  projects  in  these 
areas  already  have  been  initiated. 

However,  a considerable  body  of  opinion  con- 
siders the  household  interview  as  one  of  the  most 
promising  sources  of  data  on  health. 

There  are  limitations  to  the  accuracy  of  diag- 
nostic and  other  information  collected  in  household 
interviews.  For  diagnostic  information  the  house- 
hold respondent,  can,  at  best,  pass  on  to  the  inter- 
viewer only  the  information  the  physician  has  given 
to  the  family.  For  conditions  not  medically  attended, 
diagnostic  information  is  often  no  more  than  a de- 
scription of  symptoms.  However,  other  types  of 
facts,  such  as  those  concerning  the  circumstances 
and  consequences  of  illness  or  injury  and  the  re- 
sulting action  taken  or  sought  by  the  individual, 
can  be  obtained  more  accurately  from  household 
members  than  from  any  other  source  since  only 
the  persons  concerned  are  in  a position  to  report 
all  of  this  type  of  information.  Furthermore  this 
type  of  survey  facilitates  greatly  comparison  of 
the  ill  population  and  the  well  population,  and  as- 
sessment of  relative  impacts  of  a variety  of  ill- 
nesses and  impairments.  The  Health  Household- 
Interview  Survey  described  in  this  report  is  the 
vehicle  being  used  by  the  U.  S.  National  Health 
Survey  to  produce  data  presently  believed  to  be 
most  appropriately  obtained  from  members  of  the 
household. 


Evolutionary  Pattern 

Continuity  and  comparability  of  estimates  for 
different  time  periods  are  desirable  objectives, 
and  will  be  given  attention  in  the  interview  survey, 
especially  when  changes  are  proposed  but  they  will 
not  have  overriding  priority.  A substantial  portion 
of  resources  and  energy  of  the  NHS,  at  least  dur- 
ing its  early  years,  is  to  be  devoted  to  studies  and 
evaluation  of  quality  of  data  input,  to  efficiency  of 
collection  and  processing,  and  to  usefulness  of  out- 
put. It  is  expected  that  these  activities,  augmented 
by  the  active  and  constructive  criticism  of  users, 
will  lead  to  a program  which  is  changing  in  re- 
sponse to  need  in  scope,  content,  method,  and  spe- 
cific product. 

Although  the  interview  survey  has  only  had  1 
full  year  of  operation,  already  changes  have  been 
made  in  sample  design,  questionnaire,  and  collec- 
tion and  processing  procedures.  The  description 
given  in  the  following  pages  is  in  all  major  re- 
spects that  which  was  in  effect  through  the  first 
year  of  operation,  although  minor  changes  occurred 
from  one  quarter  to  another.  Quantitative  refer- 
ences such  as  sample  sizes  and  noninterview  rates 
apply  for  the  most  part  specifically  to  experience 
in  the  first  2 quarters  of  operation. 


The  Questionnaire 

The  questionnaire  is  a 9-part  document  which 
is  handled  by  the  interviewer  rather  than  the  re- 
spondent, and  on  which  the  interviewer  transcribes 
replies  of. the  respondent.  Most  replies  can  be  re- 
corded by  checking  proper  boxes  on  the  form.  The 
text  of  the  questionnaire  is  supplemented  by  6 check 
list  cards  which  are  shown  to  the  respondent  at 
appropriate  points  in  the  interview.  The  check  lists 
clarify  certain  questions  so  as  to  aid  the  respond- 
ent in  understanding  types  of  answers  required  and 
in  recalling  specific  experiences. 

Physically,  the  questionnaire  is  of  the  book 
type,  providing  separate  columns  for  each  of  7 
possible  members  of  a household.  If  a household 
contains  more  than  7 members,  more  than  1 ques- 
tionnaire is  used. 

A facsimile  of  the  questionnaire  is  contained 
in  Appendix  I. 

It  is  planned  that  items  on  the  questionnaire 
may  be  divided  into  2 groups— not  separately  ex- 


hibited  in  the  present  format.  One  group  consists 
of  a core  of  basic  questions  which  will  be  retained 
in  relatively  unchanged  form  over  an  extended 
period  of  time.  The  second  group  consists  of  sup- 
plementary questions  which  will  be  included  tem- 
porarily for  blocks  of  1 or  a few  calendar  quar- 
ters. This  general  plan  provides  for  the  retention 
of  regular  series  of  basic  statistics,  and  at  the 
same  time  permits  flexibility  in  securing  occa- 
sional measures  of  a wider  class  of  phenomena. 

As  initially  used,  the  questionnaire  carries  40 
items  for  identification  of  households  and  persons 
and  socioeconomic  description  of  respondents.  (A 
question  to  which  the  interviewer  must  secure  an 
answer  is  interpreted  as  one  item  in  this  count. 
The  same  interpretation  applies  in  the  following 
counts.)  It  includes  12  general  questions  on  the 
presence  or  absence  of  illness,  accidents,  impair- 
ments, or  conditions  for  each  member  of  the  house- 
hold, and  54  detailed  questions  for  each  person — 
for  whom  the  questions  are  appropriate— on  de- 
tails of  illnesses,  accidents,  and  impairments, 
and  on  medical,  dental,  and  hospital  care.  For 
most  questions,  the  recall  period  is  the  previous 
2 weeks.  But  for  some  items  of  low  incidence,  for 
which  memory  is  reliable,  such  as  hospitaliza- 
tions, the  recall  extends  over  the  year  previous 
to  the  interview. 

Interviewing  is  conducted  in  the  home,  when- 
ever possible  with  the  individual  person  if  over  18 
years  of  age,  and  otherwise  with  a responsible 
adult  member  of  the  family. 

A separate  report  on  the  questionnaire  is  in 
preparation.  It  will  treat  more  thoroughly  the  def- 
initions, concepts,  scope,  and  content  of  the  sched- 
ule. In  addition,  each  report  issued  on  a substantive 
health  topic  treats  that  part  of  the  questionnaire 
which  applies  most  directly  to  the  topic  under  study. 


Sample  Design,  Survey  Methods, 
and  Estimation 

The  sampling  plan  of  the  survey  follows  a high- 
ly stratified  multistage  probability  design  which 
permits  a continuous  sampling  of  the  civilian  pop- 


ulation of  the  United  States.  The  first  stage  of  the 
design  consists  of  an  area  sample  of  372  from 
among  about  1,900  geographically  defined  primary 
sampling  units  (PSU's)  into  which  the  continental 
United  States  has  been  divided.  A PSU  is  a county, 
a group  of  contiguous  counties,  or  a Standard  Met- 
ropolitan Area. 

With  no  loss  in  general  understanding,  the  re- 
maining stages — which  consist  of  a series  of  sam- 
plings of  successively  smaller  parcels  of  land- 
can  be  telescoped  and  treated  at  this  point  in  the 
report  as  an  ultimate  stage.  Within  PSU's  then, 
ultimate -stage  units  called  segments  are  defined, 
also  geographically,  in  such  a manner  that  each 
segment  contains  an  expected  6 households  in  the 
sample.  For  each  week  a random  sample  of  about 
120  segments  is  drawn.  Persons  in  the  approxi- 
mately 700  households  in  those  segments  are  inter- 
viewed concerning  illnesses,  injuries,  chronic  con- 
ditions, disability,  and  other  factors  related  to 
health. 

Household  members  interviewed  each  week  are 
an  independent  representative  sample  of  the  pop- 
ulation, so  that  samples  for  successive  weeks  can 
be  combined  into  larger  samples  for,  say,  a calendar 
quarter  or  a year.  Thus,  the  design  permits  both 
continuous  measurement  of  characteristics  of  high 
incidence  or  prevalence  and,  through  the  larger 
consolidated  samples,  more  detailed  analysis  of 
less  common  characteristics  and  smaller  cate- 
gories. 

The  national  sample  plan  over  a 12 -month 
period  includes  approximately  1 15,000  persons  from 
some  36,000  households  in  about  6,000  segments, 
with  representation  from  every  state.  The  design 
is  such  that  tabulations  can  be  provided  from  the 
annual  sample  for  various  geographic  sections  of 
the  United  States  and  for  metropolitan,  urban,  and 
rural  sectors  of  the  Nation. 

Estimation  is  accomplished  by  a technique 
which  insures  that  sample  results  are  consistent 
with  official  Census  Bureau  estimates  of  current 
population  by  age,  sex,  and  color,  and  which  se- 
cures significant  reductions  in  sampling  variance. 
Technically,  this  procedure  is  a 2-stage  ratio  es- 
timation. Subsequent  sections  in  the  body  of  this 
report  and  in  the  Appendices  describe  leading  fea- 
tures of  the  design  in  greater  detail. 


4.  SURVEY  PROCEDURE 


Collection  of  Data 


Data  are  collected  through  a household  inter- 
view. Over  the  Nation  there  are  120  interviewers, 
trained,  directed,  and  guided  by  17  supervisors 
located  in  Census  Bureau  Regional  Offices.  The 
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'supervisors  are  career  Civil  Service  employees 
whose  prime  responsibility  is  the  National  Health 
Survey.  They  have  administrative  and  clerical 
support  from  the  Census  Bureau  field  organization, 
and  direct  technical  guidance  from  a Health  Sta- 
tistics Branch  in  the  Washington  office  of  the  Cen- 
sus Bureau. 


The  interviewers  (initially  all  women)  are 
part-time  employees,  selected  through  an  exami- 
nation and  testing  process  which  is  administered 
by  the  supervisors,  according  to  specifications  set 
in  Washington.  The  amount  of  work  done  by  an  in- 
terviewer varies  depending  on  density  of  the  sam- 
ple near  her  home  location.  A typical  interviewer 
may  have  26  assignments  in  a year,  or  an  average 
of  1 assignment  each  2 weeks.  Usually  an  assign- 
ment consists  of  interviews  in  approximately  12 
households.  Including  training,  travel,  and  call 
backs,  the  typical  interviewer  is  employed  an  aver- 
age of  12  hours  per  week. 

Training  for  both  supervisors  and  interviewers 
is  a process  for  improving  and  controlling  the 
interview  and  data  from  it.  As  such,  it  is  a pro- 
cedure, parts  of  which  must  continue  throughout 
the  life  of  the  survey,  and  is  not  an  activity  which 
could  be  completed  at  the  beginning  of  the  opera- 
tions. 

The  supervisor  is  given  5 kinds  of  training 
beyond  the  Civil  Service  requirements  for  initial 
appointment  to  the  job. 

First,  the  supervisor  is  supplied  with  written 
background  materials  setting  forth  the  history, 
objectives,  and  purposes  of  the  undertaking.  Sim- 
ilarly he  is  given  detailed  instructions  covering 
every  aspect  and  item  of  field  operations.  He  stud- 
ies the  materials,  does  practice  exercises,  and 
takes  written  examinations. 

The  second  block  of  training,  for  the  first 
group  of  supervisors,  was  participation  for  2 weeks 
in  the  dress -rehearsal  pretest  of  the  survey  which 
took  place  in  Charlotte.  Replacements  have  simi- 
lar experiences  while  serving  as  understudies  to 
another  supervisor. 

The  third  type  of  training  comes  from  the 
continuing  flow  of  written  instructions  and  corre- 
spondence, and  of  evaluations  of  performance  sent 
out  from  Washington.  The  latter  come  from  quality- 
control  and  quality-checking  operations  performed 
in  Washington  as  part  of  the  editing  processes. 

Twice  a year  (3  times  the  first  year)  super- 
visors over  the  Nation  are  assembled  for  a 2-day 
review  of  program  objectives,  new  developments, 
and  selected  procedural  problems.  These  sessions 
permit,  of  course,  a helpful  exchange  of  ideas 
among  supervisors  and  between  the  field  super- 
visors and  the  Washington  staff. 

Finally,  the  supervisor  has  the  advantage  of 
continuing  experience  since  his  regular  job  includes 
the  training  of  interviewers,  observation  of  inter- 
viewing for  new  interviewers,  and  personally  re- 
interviewing a subsample  of  households  as  a part 
of  the  quality-control  program. 

As  stated  above,  the  prospective  interviewer 
is  selected  through  a process  of  a written  exam- 
ination and  testing  of  general  intelligence  and  for 
aptitude  for  survey  operations  which  she  would  be 
expected  to  perform.  The  new  interviewer  is  then 
given  a 5-day  initial  course  of  training.  This  course 
consists  of  5 types  of  activity:  (1)  Instruction  from 


a field  supervisor  on  purpose  and  general  charac- 
teristics of  the  survey.  (2)  A detailed  page-by-page 
study  of  all  interviewing  instructions  in  which  in- 
terviewer and  supervisor  go  through  all  instruc- 
tional material  together.  (3)  Classroom  Practice 
Exercises,  in  which  the  interviewer  solves  written 
problems  and  with  the  supervisor  subsequently 
determines  correct  answers— these  are  exercises 
rather  than  tests,  and  only  the  interviewer  knows 
definitely  how  well  she  has  succeeded.  (4)  Home 
assignments  which  also  are  written  answers  to 
problems,  which  are  treated  more  in  the  nature  of 
tests  and  in  which  results  are  discussed  by  inter- 
viewer and  supervisor.  (5)  Practice  interviewing 
in  households  under  direct  personal  observation  by 
the  supervisor.  The  study  of  instructions,  the  prac- 
tice exercises,  and  the  home  assignments  are  dis- 
tributed throughout  the  5-day  period. 

If  the  prospective  interviewer  successfully 
completes  the  training  course,  she  begins  opera- 
tional interviewing,  her  first  assignment  being 
carried  out  again  under  direct  personal  observa- 
tion by  the  supervisor. 

After  approximately  1 month,  a new  inter- 
viewer is  given  further  Home  Assignments  which 
again  are  graded  and  discussed,  if  necessary,  by 
the  supervisor.  Subsequently,  in  common  with  all 
interviewers,  she  spends  2 hours  each  month  on 
such  assignments. 

Each  quarter  the  supervisor  recontacts  about 
one  sixth  of  the  households  in  his  part  of  the  sam- 
ple. He  audits  the  household  information  obtained 
earlier  and  reinterviews  independently  one  predes- 
ignated member  of  the  household.  He  compares 
differences  between  the  two  interviews  and  attempts 
to  determine  which  information  is  correct.  These 
reinterviews  are  randomly  distributed  among  the 
interviewers  under  his  supervision  so  that  control 
charts  based  on  about  5 percent  of  an  interviewer's 
work  can  be  maintained.  Each  week,  as  a part  of 
the  editing  process  in  Washington,  error  rates  are 
calculated  separately  for  each  interviewer's  work. 
These  are  transmitted  to  the  appropriate  super- 
visor for  his  use  in  further  training  and  in  tight- 
ening control  over  the  interview  process. 

Two  or  three  times  each  year,  groups  of  in- 
terviewers are  assembled  at  Regional  Offices  for 
1-  or  2-day  refresher  courses  on  objectives,  meth- 
ods, procedures,  and  special  features  of  the  sur- 
vey. 

After  a household  has  been  selected  for  the 
sample,  a "Dear  Friend"  letter,  signed  by  the  Di- 
rector, Bureau  of  the  Census  (fig.  1),  is  addressed 
and  a few  days  before  the  expected  interview  is 
mailed  to  the  household.  This  letter  is  intended  to 
be  a general  introduction  to  the  survey,  to  have  the 
effect  of  adding  official  sanction  to  it,  and  to  make 
it  somewhat  easier  for  the  interviewer  to  secure 
an  audience.  If  no  precise  address  is  known,  this 
step  is  foregone. 

When  the  interviewer  arrives  at  the  household, 
after  a very  brief  introduction,  she  begins  imme- 


THE  DIRECTOR 


Form-NHS-600 

(4-26-57) 


Department  of  Commerce 


BUREAU  OF  THE  CENSUS 
WASHINGTON  25 


Dear  Friend: 

The  Bureau  of  the  Census  has  been  asked  by  the  Public  Health 
Service  to  act  as  its  agent  to  carry  out  a survey  to  obtain  information 
about  illnesses,  diseases  and  injuries  among  residents  of  this  area. 
The  survey  is  one  part  of  the  National  Health  Survey  Program  which 
Congress  recently  authorized  because  of  the  need  for  up-to-date  sta- 
tistics on  the  health  of  our  people.  Physicians,  research  workers,  and 
other  groups  in  health  fields  are  much  interested  in  the  knowledge 
which  will  be  gained  from  this  survey. 

Every  month  several  thousand  addresses  are  chosen  to  give  a 
cross-section  of  the  whole  United  States,  and  the  people  at  those  ad- 
dresses are  interviewed  to  obtain  the  necessary  information.  This 
month  the  address  of  your  dwelling  place  is  one  of  those  chosen,  and 
you  will  be  visited  by  a Census  Bureau  interviewer  within  the  next 
week  or  two.  The  interviewer  will  ask  you  a number  of  questions 
about  the  health  of  the  members  of  your  family,  particularly  about  the 
illness  and  injuries  you  have  had  in  recent  weeks.  Your  cooperation 
in  helping  complete  a questionnaire  will  be  very  much  appreciated. 

The  information  you  give  will  of  course  be  held  in  confidence. 
We  have  the  assurance  of  the  Public  Health  Service  that  the  informa- 
tion will  be  seen  only  by  authorized  personnel  of  the  two  agencies  and 
that  nothing  will  be  published  except  statistical  summaries  in  which 
no  individuals  can  be  identified. 


Sincerely  yours. 


Robert  W.  Burgess 
Director 

Bureau  of  the  Census 


Figure  1.  Introductory  letter  to  prospect i -re  respondent . 
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diately  to  ask  the  survey  questions.  Each  question 
is  asked  exactly  as  phrased  on  the  questionnaire. 
Required  information  for  each  person  in  the  house- 
hold is  provided  by  the  person  himself,  if  he  is  a 
responsible  person  18  years  of  age  or  older  and  at 
home  at  the  time  of  the  call;  otherwise  by  a related 
person  who  is  regarded  as  qualified  to  give  accu- 
rate information.  This  definition  of  an  eligible  re- 
spondent is  spelled  out  in  some  detail  in  the  Inter- 
viewers' Manual.  In  summary,  answers  for  chil- 
dren are  given  by  a related  adult;  for  a missing 
adult,  by  wife,  parent,  or  adult  son  or  daughter;  or 
for  an  adult  not  related  to  the  head  of  the  house- 
hold, only  by  himself  or  a related  adult.  Early  ex- 
perience indicates  that  for  persons  over  18  years 
of  age,  58  percent  are  "self-respondents,"  while 
the  remainder  for  whom  another  person  was  the 
informant  are  designated  "proxy-respondents". 
The  interview  averages  40  minutes.  Immediately 
following  the  interview  a "Thank  You"  letter  signed 
by  the  Surgeon  General  of  the  Public  Health  Serv- 
ice is  handed  to  the  respondent  (fig.  2). 

In  order  to  minimize  travel  time,  workloads 
are  so  arranged  that  when  an  interviewer  is  in  a 
neighborhood  for  an  interviewing  assignment,  he 
carries  out  necessary  listing  operations  for  seg- 
ments which  are  in  that  same  neighborhood  and 
which  will  appear  in  samples  for  the  next  2 calen- 
dar quarters.  Appendix  VI  sets  forth  in  some  de- 
tail the  manner  in  which  assignments  are  random- 
ized over  each  quarter  so  that  each  week's  inter- 
viewing constitutes  a random  sample  of  the  popu- 
lation, and  within  reasonable  arrangements  of 
workload  is  widely  diversified  by  geography  and 
interviewer. 

The  following  statistics  for  the  first  6 months 
of  operation  shed  added  light  on  selected  aspects 
of  the  collection  process.  Of  all  addresses  initially 
scheduled  for  inclusion  in  the  sample,  14  percent 
had  become,  by  time  of  call,  what  are  designated 
as  Type  B or  Type  C exclusions,  which  are  types 
of  addresses  which  should  not  be  interviewed: 
dwelling  units  which  are  demolished  or  which  on 
more  careful  inspection  were  found  to  be  outside 
chosen  sample  segments;  households  which  were 
deleted  in  the  field,  according  to  instructions, 
through  subsampling  operations  (details  on  this 
step  are  set  forth  later  in  the  report);  households 
which  were  vacant;  or  households  whose  members 
had  residence  elsewhere.  Of  those  households  in 
which  an  interview  should  have  been  conducted,  6 
percent  were  noninterviews.  One  percent  were  re- 
fusals, and  five  percent  were  not  interviewed  be- 
cause of  all  other  reasons,  but  principally  because 
no  one  was  at  home  after  repeated  call  backs. 

In  about  63  percent  of  households,  interview- 
ing was  completed  on  the  first  visit.  Percent  of 
households  for  which  various  numbers  of  revisits 
proved  to  be  necessary  are  shown  in  the  following 
breakdown. 


Number  of  visits  Percent  of  all 
households 

All  cases  100 

1 63 

2 24 

3 9 

4 3 

5 or  more  1 


Editing  and  Processing 

The  interview  is  recorded  initially  in  the  book 
questionnaire.  Form  NHS-1.  This  form  is  reviewed 
for  completeness  and  proper  identification  of  per- 
son and  household,  but  otherwise  not  edited  by  the 
supervisor  in  the  Census  Regional  Office.  Reports 
are  batched  and  transmitted  to  the  Census  Bureau 
in  Washington  for  editing  and  further  processing. 

In  Washington,  certain  control  operations  are 
performed,  reported  information  is  coded  with 
special  attention  being  given  to  medical  coding, 
and  to  adequacy  of  data  for  medical  coding  (editing 
reports  on  inadequate  information  are  returned  to 
Regional  Offices  for  future  use  in  training  and  in- 
terviewer control),  and  the  data  are  transcribed  to 
document-sensed  cards  and  then  to  punch  cards. 
These  cards  are  processed  on  conventional  punch- 
card  equipment  mainly  for  purposes  of  interviewer 
control  and  for  a more  thorough  check  for  com- 
pleteness of  entries.  Rejects  are  returned  to  clerks 
for  review  and  correction.  Corrections  and  addi- 
tions are  punched  and  added  to  the  deck.  Informa- 
tion on  cards  is  then  transferred  to  magnetic  tape, 
and  further  processing  is  handled  on  Uni  vac  elec- 
tronic computers. 

The  computer  carries  out  4 basic  opera- 
tions: (1)  an  edit  of  the  raw  reports;  (2)  the  gener- 
ation of  data  from  edited  reports  (e.  g.,  by  counting 
number  of  chronic  conditions  reported  for  a per- 
son, to  generate  the  statistic  "number  of  chronic 
conditions  reported  for  a person");  (3)  estimation 
of  specified  statistics,  including  all  necessary 
computational  steps  such  as  insertion  of  sampling 
rates  and  adjustment  for  noninterview;  and  (4)  ar- 
rangement of  estimates  into  derived  statistical 
tables. 

As  for  any  job  of  processing  and  editing  re- 
turns in  a sizeable  survey,  a myriad  of  steps  is 
necessary.  Most  of  these  need  no  mention  in  this 
account.  A few  circumstances  are  worth  noting. 

Information  moves  through  4 separate  chan- 
nels in  processing,  each  channel  being  identified 
as  a card,  and  each  card  containing  the  class  of 
information  indicated  by  its  title.  The  four  chan- 
nels are  household  cards,  person  cards,  condition 
cards,  and  hospital  cards. 

In  nearly  all  surveys  the  choice  of  definitions 
and  of  categorizing  devices  is  critical  to  the  un- 
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. . . for  the  assistance  you  have  given  the  Census  Bureau 
interviewer  who  just  visited  you. 

It  is  only  through  the  cooperation  of  you  and  others  who  are 
being  visited  that  a health  survey  such  as  this  one  can  be 
carried  on,  and  we  thought  you  would  like  to  know  how  the 
information  you  have  given  will  be  used. 

It  will,  of  course,  be  held  in  confidence.  When  combined  with 
information  given  by  other  persons  in  this  and  other  com- 
munities, it  will  reflect  health  conditions  throughout  the 
United  States  and  provide  new  knowledge  to  improve  the 
health  of  the  American  people.  It  is  because  such  knowledge 
is  now  lacking  that  Congress  recently  authorized  the  National 
Health  Survey— of  which  the  interviewing  in  this  area  is  a 
part. 

The  National  Health  Survey  will  be  collecting  information  on 
other  aspects  of  health,  and  it  is  possible  that  we  may  wish 
to  ask  for  your  further  cooperation  at  some  time  in  the 
future.  Meanwhile,  thank  you  for  your  help  today. 


Surgeon  General,  Public  Health  Service 


Figure  2.  Letter  of  appreciation  to  respondent. 
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dertaking.  This  statement  applies  with  special 
force  to  the  Health  Household-Interview  Survey. 
In  many  ways  classification  determines  the  scope 
of  the  project,  decides  whether  data  on  a particu- 
lar topic  will  result,  controls  the  meaningfulness 
of  chosen  blocks  of  information,  influences  the 
presence  or  absence  of  bias  in  the  measurement 
process,  and  generally  conditions  the  utility  of 
survey  results.  For  these  reasons,  and  because 
adequate  treatment  of  this  one  matter  is  lengthy, 
a separate  report  on  definitions  and  classifications 
is  being  prepared  for  issue  in  this  series.  Only  a 
few  remarks  are  included  here.  _ 

Wherever  possible,  standard  definitions  and 
classifications  have  been  employed.  Thus,  "dwel- 
ling unit,"  "household,"  "Standard  Metropolitan 
Area,"  "family,"  and  many  other  terms  are  de- 
fined as  in  the  Decennial  Population  Census  or 
other  widely  accepted  operations.  Similarly,  the 
International  Statistical  Classification  of  Diseases, 
Injuries,  and  Causes  of  Death  is  the  basis  of  clas- 
sifying health  conditions;  demographic,  social,  and 
economic  measures  have  been  grouped  into  classes 
which  conform,  it  is  believed,  to  most  common 
practice. 

Classifications  have  been  predesignated  and 
are  fixed,  and  the  questionnaire  is  geared  in  most 
areas  to  such  a system.  Very  little  latitude  is  given 
the  respondent  to  create  new  classes  through  his 
replies,  but  translating  replies  into  a specific  code 
is  still  a relatively  difficult  process,  especially  for 
,the  medical  coding. 

Accordingly,  medical  coding  for  each  condi- 
tion initially  has  been  done  independently  by  2 
coders,  with  differences  being  umpired  by  a coding 
expert.  Information  concerning  the  nature  of  cod- 
ing difficulties  is  being  assembled  to  permit  par- 
tial verification  and  process -control  techniques 
for  the  medical  coding. 

The  use  of  punch-card  equipment  for  early 
phases  of  editing  was  dictated  at  first  by  consider- 
ation of  workloads  on  available  equipment,  and  of 
start-stop  requirements  for  some  of  the  steps. 
Some  of  these  operations  have  been  shifted  to  the 
computer,  and  others  will  be  later. 

When  reports  are  received  in  Washington,  they 
go  through  a control  procedure  at  a "control  desk." 
This  procedure,  in  addition  to  routine  housekeep- 
ing checks,  includes  3 operations  with  statistical 
significance. 

1.  In  the  prelisting  step,  an  "expected"  num- 
ber of  households  in  each  segment  was  determined 
in  the  field  and  reported  to  Washington.  Incoming 
reports  must  account  for  the  same  number  of 
households  or  explain  discrepancies.  Any  segment 
for  which  an  unexplained  discrepancy  is  found  is 
reconciled  through  recontact  with  the  Census  Bu- 
reau Regional  Office  if  time  permits.  In  instances 
in  which  tabulation  cutoff  time  prevents  this,  the 
case  is  later  called  to  the  attention  of  the  Regional 
Office  so  that  it  will  be  used  in  initiating  neces- 
sary tightening  of  supervisory  controls  on  listing, 
interviewing,  and  clerical  operations. 


2.  In  some  segments  it  will  have  been  found, 
either  earlier  in  Washington  or  in  the  field,  that 
a chosen  segment  clearly  contained  more  than  20 
households.  In  these  cases  the  segment  was  sub- 
sampled so  that  the  final  subsample  contained 
roughly  6 households.  The  subsampling  fraction  is 
noted  at  the  Washington  control  desk  and  an  ad- 
justing order  is  transmitted  to  the  computer. 

3.  A third  type  of  review  at  the  control  desk 
adjusts  the  sampling  fraction  for  households  and 
persons  from  special  dwelling  places,  such  as  re- 
formatories, homes  for  the  aged,  or  hotels  for 
transients. 

The  general  purpose  of  all  operations  at  the 
Washington  control  desk  is  to  assure  that  data 
moving  into  the  editing  and  tabulating  stream  are, 
with  respect  to  coverage  and  weighting,  in  agree- 
ment with  the  survey  design,  within  narrow  toler- 
ances of  error. 


Evaluation  and  Control  of  Data 

A substantial  proportion  of  total  resources  of 
the  household-interview  survey  is  devoted  to  con- 
trol, evaluation,  and  improvement  of  quality  of  data 
input. 

There  are  4 very  broad  areas  of  activity 
which  have  impact  on  quality  of  data,  which  are  not 
discussed  here  in  this  connection,  but  which  are 
parts  of  the  U.  S.  National  Health  Survey,  and  which 
are  listed  by  title  in  order  to  place  in  perspective 
those  items  which  are  displayed  in  the  following 
paragraphs  as  devices  for  control  and  improve- 
ment of  data.  The  4 areas  are  (1)  the  over-all 
survey  design,  including  concepts,  definitions,  and 
general  plan  of  operation;  (2)  operating  control, 
in  the  sense  of  maintaining  general  adherence  to 
design,  including  proper  use  of  training  and  super- 
vision; (3)  utilization  of  comparative  analysis 
of  data,  including  external  checks  against  other 
sources  of  health  information,  and  especially 
against  medical  and  health  records;  and  (4)  pro- 
vision for  organized  outside  review  and  criticism 
of  both  methods  and  products  through  the  creation 
of  both  governmental  and  nongovernmental  advi- 
sory committees,  and  the  use  of  consultants. 

Aside,  then,  from  these  broad  areas  just 
named,  there  are  3 types  of  operation  which  are 
integral  parts  of  survey  procedure,  and  which  are 
principally  devices  for  control  and  improvement 
of  quality  of  data. 

The  Reinterview  Procedure.— Already  men- 
tioned, in  connection  with  training  and  supervision 
of  interviewers,  is  the  reinterview  program.  The 
supervisor  regularly  recontacts  about  one  sixth 
of  the  households  in  his  part  of  the  sample,  and 
thus,  about  one  sixth  of  the  households  assigned 
to  each  interviewer.  The  supervisor  audits  the 
household  information  previously  secured  by  the 
original  interviewer,  and  reinterviews  1 pre- 
designated member  of  the  household.  Three  main 


purposes,  and  several  lesser  objectives,  are 
served  by  this  procedure.  The  first  purpose  is  that 
of  training  and  quality  control  for  the  interviewers. 
The  second  is  measurement  of  interviewer  varia- 
bility. The  third  is  detection  of  interviewer  bias — 
to  the  extent  that  the  more  expert  supervisor  can 
discover  it.  This  means  embracing  the  assumption 
that  the  supervisor,  using  the  same  questionnaire 
and  the  same  procedure  that  were  used  by  the  ini- 
tial interviewer,  but  being  more  thoroughly  trained, 
will  secure  data  which  may  be  considered  the 
standard  which  the  interviewer  should  have  met — 
and  possibly  did.  On  the  reinterview,  all  adults 
must  be  interviewed  as  self-respondents  rather 
than  proxy-respondents.  Thus  there  is  a component 
of  variance  from  self-  vs.  proxy-respondent  as 
source.  This  component  has  some  diluting  effect 
on  measurement  of  interviewer  contributions  to 
bias  and  variance,  but  its  existence  may  make  it 
possible  to  determine  the  extent  of  bias  (if  any) 
caused  by  the  proxy-respondent. 

Processing  checks  and  controls.-At  each  prin- 
cipal processing  step,  controls  are  established 
either  by  verification  through  duplicate  processing. 


or  by  sample  verification  techniques,  based  pri- 
marily on  process  control.  Thus  far,  error  rates 
beyond  preliminary  standards  are  called  to  the  at- 
tention of  the  responsible  operating  supervisor, 
with  a recommendation  that  steps  be  taken  to  re- 
duce the  error  rate.  Further  study  of  error  rates 
and  their  probable  impact  on  estimates  are  ex- 
pected to  lead  to  a better-balanced  set  of  stand- 
ards. They  will  lead  also  to  greater  use  of  sample 
verification  and  reductions  in  100  percent  dupli- 
cation of  editing  steps. 

Internal  editing  and  consistency  checks  .-Refer  - 
ence  was  made  earlier,  and  will  be  made  again 
when  the  details  of  estimation  in  the  survey  are 
discussed,  to  editing  routines  designed  to  make 
questionnaires  internally  consistent,  and  to  elim- 
inate "impossible''  responses.  This  is  an  area  in 
which  the  number  and  type  of  possible  checks  are 
unlimited.  Experience  must  be  the  guide  in  de- 
ciding how  much  editing  is  profitable.  As  implied 
earlier,  the  first  objectives  are  to  insure  that  data 
are  consistent  and  not  obviously  incorrect.  More 
penetrating  edits  are  to  be  tested. 


5.  SAMPLE  DESIGN 


The  Multistage  Design. 

As  noted  in  the  summary  on  page  4 of  this 
report,  the  Health  Household- Interview  Survey 
rests  on  a highly  stratified,  constructively  2-stage 
probability  design.  Actual  selection  of  sample  units 
takes  place  in  a multistage  process,  which  is  mod- 
ified further  by  the  use  of  3 selection  zones  and 
41  subuniverses.  The  design  is  termed  "construc- 
tively 2-stage"  because  the  first  sampling  step  is 
the  selection  of  372  primary  sampling  units  from 
among  some  1,900  areas  into  which  the  country 
has  been  divided,  while  the  remaining  steps  lead 
effectively  to  a second  or  ultimate  sampling  stage 
in  which  small  segments  or  clusters  of  an  expected 
6 households  are  chosen  for  inclusion  in  the  sam- 
ple from  within  the  PSU's  selected  in  the  first 
step. 

The  following  paragraphs  describe  principal 
features  of  the  design,  and  the  manner  in  which  the 
sample  was  drawn.  Additional  technical  notes  on 
selected  aspects  of  the  design  are  included  in  Ap- 
pendices II  through  VII.  In  particular,  algebraic 
statements  of  the  estimating  and  variance  equa- 
tions are  given  in  Appendices  II  and  III.  Still  fur- 
ther insight  on  the  topic  can  be  gained  from  con- 
sulting Chapters  7,  8,  9,  and  12  and  Appendix  B of 
reference  2,  since  much  of  the  theory  underlying 
the  sample  design  of  the  health  survey  is  set  forth 
in  this  book. 


Primary  sampling  units.-The  PSU  is  a county, 
a group  of  contiguous  counties,  or  a Standard  Met- 
ropolitan Area.  A total  of  1,900  PSU's  exhaust  the 
land  area  of  the  continental  United  States.  Forma- 
tion of  such  PSU's  is  an  art  rather  than  a science, 
although  several  clear-cut  principles  and  rules 
were  helpful.  Prominent  among  these  are  the  fol- 
lowing 4: 

1.  PSU's  should  be  units  for  which  a wide  va- 
riety of  descriptive  statistics  is  available,  since 
this  permits  the  PSU's  to  be  stratified  or  classi- 
fied in  an  efficient  manner. 

2.  When  the  PSU  is  used  by  a large  surveying 
organization,  there  are  distinct  economies  in  using 
the  same  set  of  PSU's  for  more  than  1 survey. 
Consequently  there  are  advantages  in  having  the 
PSU  conform  to  administrative  structure  in  the 
field,  and  in  having  the  unit  adaptable  to  many  so- 
cial and  economic  objectives. 

3.  For  technical  sampling  reasons,  the  great- 
er the  internal  heterogeneity  of  the  PSU,  the  more  » 
efficient  it  is.  This  principle  tends  to  produce 
physically  large  units. 

4.  Contrastingly,  costs  per  ultimate  sample  _ 
unit  (i.  e.,  cluster  of  sample  households)  tend  to  ' 
increase  with  transportation  distances  between  ul- 
timate units  within  a PSU,  and  thus  to  increase 
with  the  size  of  the  PSU.  This  factor  has  limited 
the  size  of  a PSU  to  not  more  than  a few  neigh- 
boring counties. 
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The  above  principles  led  to  formation  of  the 
1,900  PSU's,  which  are  also  used  in  other  Census 
Bureau  surveys,  and  which,  with  a few  exceptions, 
have  these  features:  The  building  block  or  small- 
est structural  component  of  the  PSU  is  a county; 
each  PSU  in  the  Western  United  States  contains  a 
population  of  at  least  7,500  (1950  Census),  and  in 
other  parts  of  the  country  a population  of  at  least 
10,000;  each  western  PSU  contains  not  more  than 
2,000  square  miles  and  other  PSU's  not  more  than 
1,500  square  miles — unless  the  single  county  is 
larger,  which  in  the  West  resulted  in  many  PSU's 
having  less  than  7,500  persons;  and,  with  the  qual- 
ification that  each  Standard  Metropolitan  Area  is 
a PSU,  the  PSU  is  kept  as  internally  contrasting 
as  possible  in  socioeconomic  terms. 

Stratification  of  PSU's.-Sampling  theory  makes 
it  clear  that  if  units  to  be  sampled  can  be  classi- 
fied into  categories  or  strata  whose  members  tend 
to  be  relatively  alike  within  strata  and  relatively 
unlike  between  strata,  and  drawings  made  from 
those  strata,  then  resulting  sampling  variances 
are  reduced  over  those  of  samples  drawn  from  an 
unstratified  universe.  The  PSU's  were  stratified 
accordingly,  the  principal  modes  of  stratification 
being  geographic  location,  density  of  population, 
rate  of  population  growth  between  1940  and  1950, 
proportion  of  non  white,  type  of  industry  in  pre- 
dominantly urban  areas,  and  type  of  farming  in 
rural  areas.  The  general  sampling  design  con- 
templated drawing  first-stage  units  with  probabil- 
ity proportionate  to  size,  with  1 PSU  to  be  drawn 
from  each  stratum.  Further,  it  was  desired  that 
separate  estimates  be  obtainable  readily  for  each 
of  41  subuniverses — to  be  further  described  later, 
but  characterized  often  in  the  survey  as  Tab  Areas. 
These  specifications,  augmented  by  an  existing 
stratification  of  the  PSU's,  set  up  by  the  Census 
Bureau  for  other  purposes,  resulted  in  classifica- 
tion of  the  approximately  1,900  PSU's  into  372 
strata.  Further  description  of  the  precise  manner 
in  which  this  was  done  is  given  in  Appendix  IV. 

Drawing  first-stage  units.— From  each  of  the 
372  strata  1 PSU  was  selected  for  inclusion  in  the 
sample  with  probability  proportionate  to  its  1950 
population.  This  meant,  for  example,  that  a small 
PSU  with  50,000  inhabitants  in  1950  had  only  1/20 
as  much  chance  of  inclusion  in  the  sample  as  did 
the  larger  PSU  with  1 million  inhabitants.  These 
differential  sampling  rates  were  of  course  taken 
into  consideration  in  subsequent  sampling  and  es- 
timating steps. 

As  indicated,  the  selection  procedure  and  the. 
specification  that  separate  worksheet  estimates 
be  computed  for  each  of  the  Tab  Areas  had  influ- 
enced stratification.  The  Tab  Areas  initially  spec- 
ified were  the  8 largest  Standard  Metropolitan 
Areas,  and  within  each  of  11  geographic  sections, 
the  3 subsections  composed  of  (1)  smaller  Stand- 
ard Metropolitan  Areas,  (2)  other  urban  areas,  and 
(3)  other  rural  areas.  The  sections  and  8 large 
SMA's  are  shown  on  the  map  in  figure  3.  It  should 


be  understood  that  separate  statistics  will  not  be 
published  for  each  of  the  different  Tab  Areas,  but 
rather  that  data  for  Tab  Areas  can  be  consolidated 
in  more  than  one  way  into  broader  categories  for 
which  reliable  figures  can  be  produced. 

In  some  instances,  efficient  stratification  re- 
sulted in  a stratum  being  composed  of  1 single 
large  PSU.  From  such  a stratum  the  single  PSU 
enters  the  sample  with  certainty  and  is  called  a 
self-representing  PSU.  Each  of  the  8 largest  SMA's 
and  102  other  PSU's  became  self-representing 
PSU's.  Each  PSU  drawn  into  the  sample  from  a 
nonmetropolitan  stratum  was  utilized  later  as  the 
frame  for  both  "other  urban"  and  "other  rural" 
tabulation  areas.  Table  12  in  Appendix  VIII  shows 
the  geographic  distribution  of  both  self-represent- 
ing and  nonself-representing  PSU's. 

Selection  zones.— For  sampling  purposes  and 
in  order  to  reduce  over-all  variance,  the  civilian 
population  in  the  United  States  is  divided  into  3 
mutually  exclusive  classes  or  selection  zones: 

Zone  A.  Those  persons  living  in  common 
/dwelling  places. 

Zone  B.  Those  persons  living  in  areas  of  "new 
housing." 

Zone  C.  Those  persons  living  in  large  spec- 
ial dwelling  places. 

Common  dwelling  places  include  what  would  be  or- 
dinarily regarded  as  such — for  example,  private 
homes,  apartment  houses,  and  duplexes.  Areas  of 
new  housing  are  simply  those  in  which  considerable 
new  housing  has  been  built  since  the  last  population 
census  (April  1950)  and  which  have  been  recorded 
and  mapped  by  the  Census  Bureau.  These  may  in- 
clude areas  which  would  be  classed  as  belonging 
to  either  Zone  A or  Zone  C except  that  they  are 
positively  identified  as  being  in  Zone  B.  Special 
dwelling  places  include  such  places  as  penitentia- 
ries, reformatories,  homes  for  the  aged,  mental 
hospitals,  and  hotels  for  transients. 

The  372  first-stage  units  are  identical  for  all 
3 zones,  but  later-stage  sampling  is  handled  sepa- 
rately for  each  zone. 

For  the  large  special  dwelling  places.  Zone  C, 
lists  of  individual  institutions  and  organizations  in 
the  sample  PSU's  were  assembled  from  a variety 
of  sources.  These  listed  places  are  excluded  from 
further  area  sampling.  Special  instructions  for 
drawing  samples  of  persons  from  Zone  C are  pre- 
pared for  the  different  types  of  special  dwelling 
places.  Such  persons,  constituting  about  2 percent 
of  the  universe,  have  not  been  included  in  initial 
tabulation  of  data  and  are  not  discussed  further  in 
this  account. 

The  relationship  between  selection  Zones  A 
and  B and  between  Zones  B and  C is  slightly  more 
complex  and  makes  use  of  the  principle  of  stratifi- 
cation after  sampling3  and  page  468  of  reference 
2.  One  of  the  risks  of  area  sampling,  when  using 
data  on  number  of  households  for  a prior  year  as 
the  basis  for  selection,  lies  in  the  existence  of 
large  units  of  new  construction  built  since  the  prior 
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year.  This  phenomenon  causes  no  bias  in  estimates, 
but  unless  corrective  action  is  taken  will  increase 
variance. 

From  the  National  Housing  Inventory  of  1956 
the  Census  Bureau  had  available  a record  of  large 
new  contruction  activities  in  many  of  the  372  sam- 
ple PSU's.  Consider  these  PSU's  as  being  strati- 
fied into  2 classes:  (1)  those  PSU's  which  contain 
areas  of  new  housing  so  identified  by  the  Inventory; 
and  (2)  all  other  PSU's.  Segments  are  selected 
from  within  all  sample  PSU's  to  represent  Zones 
A and  C.  Zones  A and  C are  mutually  exclusive; 
that  is,  they  do  not  overlap.  The  segments  selected 
from  the  class  1 PSU's  are  then  examined  to  see 
if  they  fall  into  areas  classified  as  new  construc- 
tion areas  according  to  the  Housing  Inventory. 
Those  segments  from  class  1 which  do  not  fall  into 
new  construction  areas  and  all  segments  selected 
from  class  2 PSU's  are  retained  and  become  the 
independent  samples  for  Zones  A and  C.  The  seg- 
ments or  parts  of  segments  which  are  contained  in 
the  areas  of  new  construction,  and  which  were  ini- 
tially drawn  into  the  sample,  are  at  this  point  de- 
leted from  the  original  sample.  An  independent 
sample  is  then  taken  from  among  the  new  con- 
struction areas  of  Zone  B at  the  same  sampling  rate 
as  Zone  A.  Over-all  sampling  ratios  for  all  3 se- 
lection Zones  A,B,  and  C are  identical  within  each 
Tab  Area.  Approximately  8 percent  of  the  popula- 
tion and  of  the  sample  are  accounted  for  by  Zone 
B. 

Selection  of  segments  in  Zone  A0— Thus  by  far 
the  major  part  (90  percent)  of  the  sample  is  found 
in  Zone  A.  For  many  purposes,  it  is  convenient  to 
think  of  the  sample  as  consisting  only  of  Zone  A. 
An  outline  is  given  here  of  the  way  in  which  sam- 
pling within  PSU's  is  carried  out  for  selection  Zone 
A.  An  example  of  the  process  is  given  in  Appendix 
VI. 

The  ultimate  sampling  unit  within  the  PSU  is 
called  a segment.  It  is  a geographically  defined 
parcel  which  contains  an  expected  6 households. 
Segments  to  be  included  in  the  sample  are  chosen 
separately  for  each  Tab  Area  in  a series  of  steps 
or  stages. 

Survey  specifications  resulted  in  a require- 
ment that  over  a period  of  a year  144  segments 
are  to  be  surveyed  in  each  Tab  Area.  Within  cho- 
sen segments,  all  households  are  interviewed.  (As 
noted  in  Section  4,  if  it  develops  that  a selected 
segment  contains  obviously  more  than  20  house- 
holds, it  is  subsampled  and  approximately  6 house- 
holds in  it  are  interviewed.) 

The  selection  procedure  allocates  the  number 
of  segments  to  be  interviewed  to  first-stage  units 
in  the  Tab  Area  in  proportion  to  the  size  of  the 
stratum  they  represent.  Segments  are  drawn  with- 
in PSU's  through  a sequence  of  selection  of  suc- 
cessively smaller  units  of  area  until  finally  a unit 
containing  the  expected  6 households  is  secured. 
This  becomes  the  ultimate  sampling  unit.  An  illus- 


tration of  the  procedure  is  given  in  Appendix  VI. 

Samples  for  the  year,  quarter,  and  week.— Ini  - 
tial  sampling  is  carried  on  in  a way  which  makes 
the  segments  reported  for  each  calendar  quarter 
an  independent  sample  of  the  land  area  of  the  United 
States.  The  quarterly  samples  are  additive  and 
thus  the  annual  sample  is  4 times  the  size  of  the 
quarterly  samples.  The  samples  are  also  random- 
ized by  weeks  within  each  quarter,  so  that  each 
week's  interviews  become  a random  sample  of  the 
population  and  the  weekly  samples  are  additive 
within  the  quarter.  The  detail  by  which  this  is  ac- 
complished is  illustrated  in  Appendix  VII.  The  full 
survey  design  is  effective  over  each  quarter.  The 
weekly  samples  are  unbiased  but  necessarily  fol- 
low a more  restricted  design,  on  the  average  de- 
pending upon  a first-stage  selection  of  60  rather 
than  372  PSU's. 


Mapping  of  Segments 
and  Listing  of  Households 

For  each  segment  in  the’  sample,  the  inter- 
viewer is  furnished  2 maps:  a Key  Map  and  a Seg- 
ment Map. 

The  Key  Map  shows  the  general  location  of  the 
segment  and  may  be  a county  highway  map  or  a 
city  street  or  block  map.  The  segment  number  and 
approximate  location  of  the  segment  (shown  by  the 
large  dot  beneath  "Hillcrest  Avenue"  in  figure  4 
are  entered  on  the  Key  Map. 


Figure  4.  Key  Hap  showing  Segment  0534. 
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The  Segment  Map  shows  exact  boundaries  of 
the  segment  and  may  show  either  the  exact  or  gen- 
eral location  of  some  structures  within  the  seg- 
ment, depending  on  the  kind  of  map  available.  In 


some  cases  no  structures  are  shown  on  the  map. 
Two  illustrations  of  Segment  Maps  are  shown  in 
figures  5 and  6.  The  segment  boundaries,  in  any 
case,  are  outlined  on  the  map. 
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Figure  6.  A second  type  of  Segment  Map. 

D - Dwelling  F - Flat  S - Store  Apts.  - Apartments 

The  numbers  inside  indicate  the  number  of  floors  in  the  structure  and  the 
the  margin  are  street  numbers. 
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Interviewers  are  instructed  to  list  all  house- 
holds or  dwelling  units  in  the  selected  segments. 
The  meaning  of  this  instruction  is  summarized  in 
the  sentence:  "Write  on  prepared  forms  the  ad- 
dresses or  other  descriptions  of  all  places  where 
people  live  or  might  live,  including  such  places  as 
ordinary  house  dwellings,  apartments,  duplexes, 
trailers,  tents,  houseboats,  converted  boxcars,  and 
rented  rooms,  including  everything  which  lies  in- 
side the  defined  segment."  The  instruction  is  sup- 
ported and  amplified  by  the  maps  and  a 93-page 
indexed  listing  manual.  The  listing  operation  is  con- 
ducted at  a time  prior  to  interviewing,  thus  pro- 
viding 2 checks  on  coverage:  one  at  the  time  of  list- 
ing and  a second  at  the  time  of  interviewing. 

Summary  of  units.— Several  different  kinds  of 
units  and  categories  are  mentioned  in  this  section 
and  in  the  Appendices.  It  may  be  helpful  to  reca- 
pitulate in  capsule  form  the  principal  elements  of 
terminology. 


Tab  Area 

- One  of  41  subuniverses,  defined 
by  geographic  boundaries  and  by 
size  and  density  of  population. 

PSU 

- Primary  sampling  units  consist- 
ing of  1 or  a group  of  contig- 
uous counties:  about  1,900  of 
them  in  the  United  States;  372 
in  the  NHS  sample. 

Strata 

- 372  socioeconomic  classes  into 
which  the  PSU's  are  grouped. 

ED 

- Enumeration  District,  a geo- 
graphic subdivision  of  a PSU, 
usually  containing  between  50 
and  1,000  households. 

Segment 

- A subdivision  of  an  ED  containing 
an  expected  6 households.  This 
is  normally  the  ultimate  sam- 
pling unit  in  the  survey. 

Selection  Zones  - Strata  in  a different  dimension. 


based  upon  type  of  dwelling  unit, 
and  utilized  in  reducing  vari- 
ance. 

Dwelling  Unit  - Place  where  persons  live  or 
might  live.  This  is  the  unit  list- 
ed for  subsequent  interviewing 
purposes. 

Elementary  Unit-  There  are  4 elementary  units 
or  channels  for  processing  in- 
formation which  are  utilized  in 
the  survey:  (1)  the  household; 
(2)  the  person;  (3)  the  health  con- 
dition— illness,  injury,  chronic 
condition,  or  impairment;  and 
(4)  the  episode  of  hospitaliza- 
tion. 


The  Estimating  Process 

Some  aspects  of  the  estimating  process  were 
treated  in  Section  4 under  Editing  and  Processing, 
and  other  aspects  are  influenced,  of  course,  by  the 


sample  design  which  has  just  been  discussed.  In 
what  follows  in  the  present  section,  the  focus  of 
attention  is  on  the  estimating  problem  as  such. 

The  estimation  process  in  the  health  survey 
is  basically  simple,  although  actual  procedure  in- 
cludes a considerable  number  of  steps.  Leading 
reasons  for  the  apparent  complexity  are  4 in  num- 
ber, growing  largely  out  of  the  fact  that  the  survey 
produces  a variety  of  estimates  in  several  dimen- 
sions: 

Geographic  scope.— The  survey  yields  work- 
sheet figures  for  the  Nation  as  a whole,  and  also 
for  constituent  Tab  Areas.  The  Tab  Areas  can  be 
combined  into  geographic  divisions  of  the  country, 
or  into  classes  which  reflect  size  and  density  of 
the  population  in  the  community. 

Type  of  statistic.— Three  variations  may  be 
distinguished  under  this  heading.  (1)  The  number 
or  proportion  of  persons  in  the  population  with  a 
specified  characteristic,  such  as  having  1 or  more 
chronic  conditions,  or  not  having  visited  a physi- 
cian within  the  year  immediately  previous  to  the 
week  of  interview.  (2)  Estimated  volumes  of  events 
arising  from  , tabulating  answers  to  such  direct 
questions  as,  "How  many  days  were  you  in  the  hos- 
pital, not  counting  the  day  you  left?",  with  editing 
converting  the  reply  to  "Number  of  hospital  days 
in  past  year."  (3)  The  incidence  of  a particular 
disease  or  health  condition,  built  up  from  cumulat- 
ing occurrences  over  2 -week  periods  as  reported 
by  persons  interviewed  in  successive  weeks. 

The  first  type  of  statistic  named  above  will  be 
recognized  as  an  instance  of  binomial  estimation 
(modified  of  course  by  the  structure  of  the  sample 
design),  since  each  individual  respondent  will  either 
have  or  not  have  the  specified  characteristic.  The 
second  type  of  statistic  is  like  the  first  except  that 
the  population  measures  involved  are  quantitative 
rather  than  qualitative  variables,  and  consequently 
estimation  is  not  binomial.  The  distinction  between 
the  second  and  third  types  of  statistic  is  sharpened 
perhaps  with  an  example.  Approximately  115,000 
persons  are  interviewed  each  year  in  the  health 
survey,  about  2,200  each  week.  Each  of  the  115,000 
persons,  in  effect,  gives  the  interviewer  the  num- 
ber of  days  he  spent  in  the  hospital  in  the  previous 
year,  and  thus  provides  data  which  permit  an  esti- 
mate of  the  number  of  days  of  hospitalization  ex- 
perienced by  living  persons  in  the  year  previous  to 
the  week  of  the  interview.  This  is  a type  (2)  esti- 
mate. Similarly,  each  week  approximately  2,200 
persons  report  their  days  of  hospitalization  in  the 
previous  2 weeks.  Summing  these  reports  over  52 
weeks  of  interviewing  and  taking  account  of  the  1- 
week  overlap  in  reference  periods  for  adjacent 
weeks  of  interviewing  would  provide  the  basis  for 
a second  estimate  of  a year's  hospitalization,  this 
time  the  resulting  statistic  being  of  type  (3).  More 
is  said  later  on  the  procedure  whereby  estimates 
of  type  (3)  are  produced. 

Some  might  also  wish  to  distinguish,  under 
this  title,  between  estimates  of  an  aggregate,  such 
as  total  number  of  physician  visits  for 


class  of  persons,  and  estimates  of  a rate  which 
would  express  the  number  of  physician  visits  for 
the  class  per  100  persons  in  the  class. 

Time  reference.— Since  the  sample  is  continu- 
ous, it  can  be  used  to  provide  estimates  based  on 
interviews  of  the  population  over  a week,  a quarter, 
a year,  or  other  time  intervals.  Also  reference 
periods  for  occurrence  or  volume  of  events  can  be 
varied  widely,  ranging  for  some  items  from  a week 
to  any  multiple  of  weeks  within  the  history  of  the 
survey. 

Form  of  estimate.— The  statistics  produced 
from  the  stratified  design  through  2 stages  of  ra- 
tio estimation  are  the  products  of  a design  which  is 
much  more  efficient  than  a simple  random  sample 
would  have  been,  but  which  necessarily  require 
somewhat  more  elaborate  computation. 

Steps  in  estimation.— In  the  interest  of  bringing 
out  main  threads  of  the  estimating  story,  obscured 
as  little  as  possible  by  the  crosscurrents  just  noted, 
the  remainder  of  this  section  is  written  mostly 
around  the  production  of  estimates  of  an  average 
number  of  persons  with  a specified  characteristic, 
the  average  being  based  on  interviewing  over  13 
weeks.  The  population  referred  to  is  the  civilian 
noninstitutional  population  of  the  continental  United 
States  rather  than  that  of  one  of  the  Tab  Areas. 
An  aggregate  rather  than  a proportion  or  rate  is 
the  statistic  under  observation.  Occasional  varia- 
tions from  this  pattern  will  be  necessary. 


As  indicated  earlier,  incoming  reports  are 
passed  through  controls  to  insure  that  the  data  in- 
put to  the  computers  is  consistent  with  sample  de- 
sign, properly  coded,  and  capable  of  being  tabu- 
lated. 

Step  2 

A series  of  mechanical  edits  are  carried  out 
on  the  computers.  These  edits  make  the  question- 
naire internally  consistent,  and  adjust  or  account 
for  item  nonresponse. 

Step  3 

Into  each  record  of  an  elementary  unit  (person, 
household,  condition,  and  hospitalization)  basic 
sampling  inflation  factors  are  inserted.  This  step 
takes  account  of  all  stages  of  sampling.  The  factor 
is  the  reciprocal  of  the  combined  sampling  frac- 
tion which  for  a quarterly  tabulation  varies  among 
Tab  Areas  from  about  l-in-2,000  persons  to  about 
l-in-19,000  persons.  [Sampling  fractions  for  an- 
nual samples  are  one  quarter  of  these  numbers.] 

Step  4 

Statistical  theory  demonstrates  that  a "ratio 
estimate"  for  any  statistic  is  superior  to  an  or- 
dinary "inflation  estimate"  if  there  is  correlation 
between  the  numerator  and  the  denominator  of  the 
ratio.  Specifically,  if  Y'  and  x'  are  ordinary  in- 


flation estimates  of  2 characteristics  of  a popula- 
tion, Y and  X,  respectively,  and  if  the  "true"  total 
X is  known  independently,  then  the  ratio  estimate 

Y*  = X is  a better  estimate  of  Y than  is  Y' , 

provided  there  is  correlation  between  y'  and  x'. 

In  this  form  of  estimate,  the  quantity  becomes 

a calibration  factor  for  the  survey. 

This  principle  is  utilized  at  2 stages  in  the 
NHS.  In  the  first  instance  it  is  used  to  reduce  sam- 
pling variance  between  PSU's.  Estimates  of  the 
1950  population  which  would  have  been  obtained 
from  a complete  enumeration  of  the  372  PSU's  but 
not  other  PSU's  in  the  country  were  compared  with 
official  1950  population  counts  for  each  of  120 
color-residence  classes.  Resulting  factors  are 
shown  in  table  A. 

In  calculation,  these  factors  are  used  in  the 
following  manner,  the  arithmetic  being  carried  out 
automatically  by  the  computers.  Consider  a sample 
record  for  a person  who  is  white  and  who  comes 
from  an  urban  nonself-representing  Standard  Met- 
ropolitan Area  in  Geographic  Region  1.  All  sample 
records  for  this  person  are  multiplied  by  the  fac- 
tor 1.075380.  (See  1st  line,  2d  column  of  table  A.) 
This  brings  the  sample  data  into  closer  conform- 
ity with  population  controls  for  the  universe,  intro- 
duces only  trivial,  if  any,  bias  into  the  estimate, 
and  reduces  sampling  variance. 

NOTE:  Steps  1 through  4 are  carried  out 
weekly,  and  provide  a "deck  of  cards"  (Uni- 
vac  tape)  of  edited  and  adjusted  sample  data 
for  each  week  of  the  13  weeks  of  the  quarter. 
The  "scale"  of  data  at  this  point  is  therefore 
l/13th  of  universe  totals.  Weekly  data  are 
merged  later  into  quarterly  totals.  Steps  5 
and  later  apply  to  the  merged  quarterly  decks. 

Step  5 

Respite  intensive  follow-up  efforts,  reports 
on  some  households  in  the  sample  have  not  been 
received  at  the  tabulation  cutoff.  In  the  first  2 
quarters  of  operation  the  noninterview  rate  was  6 
percent — 1 percent  refusal,  and  the  rest  for  all 
other  reasons,  such  as  no  one  at  home  after  re- 
peated call  backs.  For  a sample  household  for 
which  no  interview  is  obtained,  any  estimating  pro- 
cedure must  necessarily  impute  values  for  each 
statistic  for  which  measurement  had  been  intended. 

Adjustment  for  noninterviews  in  the  health 
survey  is  accomplished  by  a calculation  which  as- 
sumes that  respondents  within  a particular  seg- 
ment for  a quarter  represent  the  nonrespondents 
in  that  segment.  In  the  rare  instance  in  which  less 
than  half  of  a segment  is  interviewed,  the  nonin- 
terview adjustment  is  modified  by  evidence  from 
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Table  A.  First-stage  ratio  estimate  factors  for  nonself -representing1  PSU's  by  residence 

color,  and  section 


Type 

Urban 

Rural  nonfarm 

Rural 

farm 

White 

Nonwhite 

White 

Nonwhite 

White 

Nonwhite 

Nonself-representing  SMA's 

geographic  section  1 

1.075380 

1.753515 

.678533 

.673733 

.579098 

.488372 

2 

.973243 

.809129 

1.048442 

.927872 

1.345792 

.721580 

3 

.755966 

.724533 

.664274 

.633703 

.723622 

.695719 

4 

1.328674 

.927637 

1.158533 

1.158533 

1.580769 

1.580769 

5 

1.210693 

.509411 

1.461507 

.428340 

1.640072 

.383598 

6 

none 

7 

.973222 

.869424 

.889450 

.715170 

.870792 

.745692 

8 

1.076720 

1.027738 

.899394 

1.349170 

.776500 

.991501 

9 

1.179743 

1.179743 

1.094866 

1.874840 

1.250069 

1.250069 

10 

.873733 

.492896 

2.502347 

2.502347 

.873241 

.873241 

11 

no 

me 

Nonself -representing  non- 

SMA ' s geographic 

section  1 

1.000096 

.791543 

1.066578 

1.084175 

.991135 

1.003409 

2 

1.050026 

.891068 

.959741 

1.243901 

.991556 

1.301490 

3 

1.101374 

1.476958 

.962977 

.883253 

.919709 

.728575 

4 

1.022545 

1.142936 

.953182 

.912743 

.997690 

.871958 

5 

1.109184 

1.649665 

.873888 

.873888 

1.000277 

1.673347 

6 

1.000335 

1.065752 

1.011195 

1.276623 

1.013374 

1.156285 

7 

.988710 

1.044770 

1.086449 

1.027072 

1.008986 

1.071351 

8 

.984943 

1.028117 

1.016688 

1.026288 

1.004002 

.968411 

9 

.980173 

.977126 

1.005968 

1.013235 

1.016431 

.990630 

10 

1.043498 

.979703 

1.005739 

1.108361 

.890752 

1.199575 

11 

1.020053 

1.203531 

.978919 

.807592 

1.006944 

1.009057 

'First-stage  ratio  estimate  factors  for  each  of  8 large  separate  tabulation  areas  and  for  the  self-repre- 
senting PSU's  is  1.000000. 


reports  over  the  entire  Tab  Area.  An  illustration 
of  the  process  is  given  fora  hypothetical  Tab  Area: 


Segment 

number 

House- 
holds 
sched- 
uled for 
inter- 
view 

House- 

holds 

not 

inter- 

viewed 

Segment 

adjust- 

ment 

factor 

Excess 

non- 

inter- 

views 

1 

6 

0 

1.0000 

0 

2 

6 

1 

1.2000 

0 

3 

8 

0 

1.0000 

0 

4 

I 

4 

3 

2.0000 

1 

2 

Tab  Area 

total 

220 

10 

- 

2 

Data  for  the  5 reports  in  segment  2 are  multiplied 
by  the  factor  1.2000  so  that  the  5 reports  represent 
the  6 households  intended  for  interview  in  the  seg- 
ment. Segment  4 in  the  example  is  of  the  unusual 
type  (where  less  than  half  the  households  in  the 
segment  were  interviewed)  which  leads  to  a fur- 
ther adjustment  at  the  Tab  Area  level  after  a pre- 
liminary one  has  been  made  at  the  segment  level. 
The  Tab  Area  adjustment  factor  is  the  ratio  of 
total  households  scheduled  for  interview  to  total 
households  scheduled  for  interview  less  the  "ex- 
cess" noninterviews;  that  is,  the  factor  is  220/218 
or  1.0092,  in  the  example.  Data  for  all  reporting 
households  in  the  Tab  Area  are  multiplied  by  this 
factor  to  account  for  the  2 -household  "excess"  of 
noninterviews. 

Step  6 

Advantages  of  the  ratio-estimating  process 
are  exploited  further  by  the  introduction  of  a sec- 
ond calibrating  or  ratio  factor  which  brings  the 
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estimates  of  the  U.  S.  population  derived  from  the 
health  survey  into  agreement  with  independently 
determined  controls  for  76  age-sex-color  classes 
of  the  population.  For  the  first  full  quarter  of  oper- 
ation these  factors  ranged  from  0.61  to  1.36,  with 
NHS  estimates  for  62  of  the  76  classes  coming 
within  12  percent  of  the  controls.  The  over-all 
NHS  estimate  of  the  U.  S.  population  before  this 
final  adjustment  was  within  0.3  of  1 percent  of  the 
control  on  the  population. 

The  effect  of  these  6 steps  is  (1)  to  use  the 
household  survey  as  an  instrument  for  obtaining 
percent  distributions  of  the  population  by  specified 
characteristics  of  illness  and  health  conditions, 
and  (2)  to  produce  estimates  of  total  numbers  of 
persons  in  the  population  with  these  specified  char- 
acteristics by  multiplying  the  derived  percent  dis- 
tribution by  population  controls.  Rates  are  calcu- 
lated by  obtaining  ratios  of  the  appropriate  esti- 
mated aggregates. 

Tabulations  of  items  other  than  average  num- 
ber of  persons  with  specified  characteristics  over 
a quarter  are  obtained  in  a similar  manner,  but 
with  variations  in  procedure,  the  particular  varia- 
tions depending  on  the  nature  of  the  item.  Two  ex- 
amples may  suggest  the  kinds  of  variations  which 
are  needed. 

Consider  again  the  type  (3)  estimate  discussed 
above,  in  which  the  objective  is  to  obtain  an  esti- 
mate of  the  total  number  of  days  of  hospitalization 
over  a year,  and  consider  first  an  estimate  over  1 
quarter.  An  item  on  the  questionnaire  asks  each 
person  for  the  number  of  such  days  in  the  2 -week 
period  immediately  preceding  the  calendar  week 
of  interview.  Each  week's  interviewing,  since  it  is 
an  independent  sample  of  the  population,  produces, 
by  the  process  described  in  the  6 steps  above,  an 
estimate  of  l/13th  of  the  total  hospital  days  over  a 
2-week  period.  (It  will  be  recalled  that  the  sampling 
fractions  have  been  expressed  in  terms  of  13  weeks 
of  interviewing,  and  the  weighting  factors  have  been 
set  accordingly  in  the  computer.)  Multiplication  by 
6.5  yields  l/13th  of  the  total  visits  for  a 13-week 
period.  Summation  of  samples  over  the  quarter 
yields  the  estimate  for  a 13 -week  period.  The  par- 
ticular 13 -week  period  is  the  one  extending  from 
the  12th  week  of  the  quarter  preceding  the  quarter 
of  interviewing  through  the  11th  week  of  the  quar- 
ter of  interviewing,  since  tabulation  is  geared  to 
weeks  of  interviewing  which  lie  in  the  calendar 
quarter.  While  this  period  does  not  correspond 
exactly  with  the  13  calendar  weeks  of  the  quarter, 
the  displacement  is  small,  and  estimates  made  in 
this  manner  are  used  as  estimates  for  the  calendar 
quarter.  Similarly  produced  estimates  summed 
for  4 successive  quarters  would  yield  an  approxi- 
mate estimate  of  hospital  days  for  the  population 
over  the  year.  This  estimate  does  not  include  hos- 
pital days  for  persons  who  died  within  the  2-week 
period  immediately  preceding  the  week  of  inter- 
view, since  the  scope  of  the  household  survey  is 
the  living  population  in  the  week  of  interview. 
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A second  illustration  relates  to  combining  es- 
timates for  more  than  1 quarter,  when  the  quarter- 
ly estimates  have  been  expressed  as  rates.  The 
problem  might  be  formulated  in  many  ways.  One 
will  suffice  here.  From  each  quarter's  sample  an 
estimate  of  the  average  number  of  persons  who 
have  experienced  1 or  more  days  of  bed-disability 
in  a 2 -week  period  can  be  produced.  This  figure 
divided  by  the  average  population  for  the  quarter 
yields  a rate.  An  annual  rate  based  on  experience 
for  a year  rather  than  for  a quarter  could  be  formed 
in  more  than  one  way.  An  acceptable  solution  is  a 
weighted  average  rate  calculated  as  indicated: 

Let 

EL  be  the  number  of  persons  in  i^  quarter 

with  1 or  more  days  of  bed-disability  in 
a 2-week  period,  as  estimated  in  the  first 
example  above, 

N.  be  average  population  in  i^  quarter,  and 
equal  to  EL/N.  be  the  quarterly  rate; 

then  the  annual  rate,  R,  may  be  estimated  as 

4 

Y R.N.  . 

Aj  i 1 

R equal  to  i=l 
4 

£ h, 

i=l 


Sampling  and  Measurement  Errors 

Reliability  of  statistical  surveys.— All  statis- 
tical surveys,  whether  based  on  samples  or  at- 
tempted complete  enumerations,  are  subject  to 
potential  inaccuracies.  These  risks  include,  among 
others,  errors  in  conceptual  formulation,  ambigu- 
ities in  definition  and  in  the  questionnaire,  faulty 
classification,  interviewer  variability  and  bias, 
respondent  bias  and  variability,  biases  from  non- 
response or  incomplete  coverage,  mistakes  in 
editing,  and  tabulation  errors.  This  broad  group 
of  imperfections  can  be  subsumed  by  the  term 
"measurement  error,"  which  includes  all  nonsam- 
pling hazards.  Measurement  error  plus  sampling 
error  may  be  called  total  survey  error. 

Ideally  it  is  desirable  to  detect  all  major 
components  of  total  survey  error,  quantify  each  of 
them,  and  allocate  resources  in  such  a fashion  that 
total  survey  error  is  minimized.  Occasionally  it 
is  preferable  to  exclude  from  consideration  cer- 
tain specified  components,  even  if  they  are  large. 


if  their  presence  can  have  little  impact  on  deci- 
sions which  will  be  based  on  results  of  the  sur- 
vey— for  example,  it  may  be  well  to  tolerate  cer- 
tain kinds  of  constant  bias,  if  the  survey  is  to  be 
used  principally  to  assess  change  from  one  point 
in  time  to  another. 

Measurement  error. — A rather  substantial 
portion  of  the  total  budget  for  the  National  Health 
Household- Interview  Survey  is  earmarked  for  the 
study  of  measurement  error  and  the  evaluation  of 
results.  This  topic  is  not  covered  in  any  detail  in 
the  present  report.  As  noted  earlier,  however,  the 
initial  program  includes  3 main  areas  of  explora- 
tion: (1)  built-in  tests  and  controls,  such  as  the 
reinterview  operation,  which  will  provide  data  on 
interviewer  variation  and  bias,  and  as  consistency 
controls  on  medical  coding;  (2)  external  special 
statistical  checks,  such  as  ad  hoc  studies  of  se- 
lected medical  and  health  records;  and  (3)  com- 
parative analysis  of  data  from  the  household-inter- 
view  survey  and  of  evidence  from  other  sources  of 
health  information. 

Sampling  error. — Since  estimates  from  the 
health  survey  are  based  on  a sample  of  households 
rather  than  on  a complete  census  of  persons  in  the 
United  States,  they  will  differ  somewhat  from  fig- 
ures which  would  be  obtained  from  a complete 
enumeration  using  the  same  schedules,  instruc- 
tions, interviewers,  and  procedures.  Inasmuch  as 
it  is  possible  in  the  sample  to  use  better  trained 
interviewers,  and  in  general  to  maintain  tighter 
operational  control  than  would  be  feasible  in  an  at- 
tempted complete  enumeration  of  many  character- 
istics of  a population  of  more  than  170  million 
persons,  it  is  entirely  likely  that  the  sample  re- 
sults are  subject  to  a smaller  measurement  error 
than  would  be  those  from  a census.  The  usual  yard- 
stick of  sampling  variability  is  the  standard  error, 
or  the  relative  standard  error.  Appendix  III  sets 
forth  the  method  by  which  standard  errors  for  sta- 
tistics from  the  survey  are  computed.  The  method 
used  reflects  both  the  chance  error  that  arises 
from  sampling,  and  a part  of  the  variation  which 
resides  in  the  measurement  process.  It  does  not 
include  the  part  of  measurement  variation  which 
is  unaffected  by  sample  size,  nor  does  it  include 
any  biases  which  may  lie  in  the  data. 

For  probability  samples  of  the  type  of  the 
health-interview  survey,  sampling  reliability  for 
any  statistic  from  the  survey  can  be  stated  roughly 
in  these  terms:  A census  would  produce  figures 
within  1 standard  error  of  the  published  sample 
estimate  for  about  2 out  of  3 of  the  statistics  shown 
and  within  2 standard  errors  of  the  published  sam- 
ple estimate  for  roughly  19  out  of  20  of  the  inde- 
pendent statistics  shown.  A somewhat  more  precise 
statement  might  read:  "In  a complete  enumeration 
conducted  under  identical  circumstances  the  meas- 
ured statistic  would  lie  in  the  interval  formed  by 
the  published  sample  figure  plus  or  minus  k times 
the  standard  error";  and  the  probability  that  this 
is  a true  statement  is  given  in  the  following  table. 


If  k is. 

then  the  statement 
is  true 
approximately 

1 

2 times  out  of  3 

2 

19  times  out  of  20 

2lA 

99  times  out  of  100 

Reports  published  by  the  health  survey  include 
statements  of  sampling  reliability  for  principal  es- 
timates included  in  the  report.  In  addition,  as  ex- 
perience is  gained,  it  is  expected  that  general 
guides  and  rules  of  thumb  will  be  developed  where- 
by users  of  the  statistics  can  secure  approximate 
sampling  errors  for  other  figures,  with  a minimum 
of  effort. 

It  may  be  useful  to  note  relative  magnitudes 
among  some  of  the  different  classes  of  statistics 
which  will  come  from  the  household  survey. 

If  V is  the  relative  standard  error  for  a sta- 
tistic which  refers  to  an  estimate  for  a U.  S.  total, 
then  relative  standard  errors  for  the  same  statis- 
tic when  it  refers  to  other  subdivisions  of  the 
United  States  usually  will  be  of  the  general  mag- 
nitude indicated  in  table  B. 


Table  B.  Magnitudes  of  statistics  for  sev- 
eral types  of  area 


Area 


Rough  magnitude  of 
relative  sampling 
standard  error 


U.  S.  total 
A geographic  section 
(e.  g.,  New  England)  3.3 

Rural  United  States  2.0 

The  non-metropolitan 
urban  sector  of  the 
United  States  2.0 

Metropolitan 

United  States  1.5 


V 

V 

V 

V 

V 


Similarly,  if  A.  is  a relative  standard  error 
for  a statistic  which  rests  on  data  for  a year's  in- 
terviewing, the  magnitude  of  the  corresponding 
relative  error  for  the  statistic  based  on  1 quarter's 
sample  will  be  about  1.7  A. 

If  B is  a relative  error  for  a characteristic 
possessed  by  1 percent  of  the  population,  the  rela- 
tive error  for  a statistic  possessed  by  10  percent  of 
the  population  will  have  magnitude  approximately 


30  percent  of  J3;  the  relative  error  for  a statistic 
possessed  by  50  percent  of  the  population  will  have 
magnitude  of  the  order  of  10  percent  of  B. 

Standard  errors  of  differences  between  esti- 
mates of  the  same  statistics  for  2 points  in  time 
will  be  40  percent  larger  than  the  standard  error 
of  the  statistic  at  a fixed  point  in  time. 

Finally,  the  reliability  of  an  estimated  rate  or 
percent,  computed  by  using  sample  data  for  both 
numerator  and  denominator,  depends  upon  the  size 
of  the  rate  and  the  size  of  the  total  upon  which  the 
rate  is  based.  Estimated  rates  are  relatively  more 
reliable  than  corresponding  absolute  estimates  of 


the  numerator  of  the  rate,  particularly  if  the  rate 
is  high.  However,  ratios  of  estimated  aggregates 
to  total  population  for  an  age-sex-color  class  have 
the  same  relative  sampling  variance  as  the  esti- 
mated aggregate,  as  a result  of  the  ratio  estimat- 
ing technique  which  was  employed. 

Illustrative  sampling  errors. — Relative  sam- 
pling errors  have  been  calculated  for  a number 
of  estimated  national  statistics  based  on  data  for 
the  first  13  weeks  of  interviewing.  The  extent  to 
which  these  values  prove  to  be  typical  must  await 
the  evidence  of  later  data.  Illustrative  errors  are 
presented  in  table  C. 


Table  C.  Illustrative  relative  sampling  errors  for  national  statistics  from  the  U.  S.  Na- 
tional Health  Survey,  based  on  data  from  interviewing  during  the  13-week  period  ending 
September  29,  1957 


Statistic 

Size  of 
statistic 
(000,000) 

Relative 

standard 

error 

Number  of  bed-days  for  medically  attended  chronic  conditions 
in  last  12  months 

756 

0.010 

Number  of  visits  to  the  doctor 

199 

0.022 

Number  of  acute  conditions 

70 

0.030 

Number  of  acute  conditions , medically  attended 

47 

0.042 

Number  of  persons  with  chronic  limitation  of  activity 

17 

0.030 

Number  of  persons  injured  in  accidents — 

14 

0.051 

Number  of  persons  injured  in  motor-vehicle  accidents 

1 

0.175 
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APPENDIX  I 

ILLUSTRATION  OF  CONTENT  OF  INITIAL  BASIC  HOUSEHOLD  QUESTIONNAIRE 


The  Items  below  show  the  exact  content  and  wording  of  the  questionnaire  used  in  the  household  survey.  The  actual 

questionnaire  is  designed  for  a household  as  a unit  and  includes  additional  spaces  for  reports  on  more  than  one  person. 

Tba  National  Health  Survey  is  authorized  by  Public  Law  652  of  the  84tb  Congress  (70  Stat  489:  42  O.S.C.  305).  All  Information  sblcb 

would  permit  Identification  of  tbe  Individual  will  be  held  strictly  confidential,  will  be  used  only  by  persons  engaged  In  and  for  tbe 

purposes  of  the  survey,  end  will  not  be  disclosed  or  released  to  others  for  any  other  purposes  (22  Rt  1687). 


U.S.  DEPARTMENT  OP  COMMERCE 
BUREAU  OF  THE  CENSUS 
Acting  as  Collecting  Agent  for  the 
U.S.  PUBLIC  HEALTH  SERVICE 

NATIONAL  HEALTH  SURVEY 


Questionnaire 


Address  or  description  of  location 


4.  Sub-  5.  Sample  6.  PSU 
sample 
weight 


far*  or  ranch?. 


□ res  □, 


Ask  at  all  units  except  apartment  houses 

13.  Is  there  any  other  building  on  this  property  for 

people  to  live  in  - either  occupied  or  vacant?  (_J 


15.  RECORD  OP  CALLS  AT  HOUSEHOLDS 


Entire  household 


Callbacks  for  | 
individual 
respondents 


16.  REASON  FOR  I 


□ Temporarily  i 

□ Other  f Sped 


□ Vacant  - Non-seasonal 

□ Vacant  - seasonal 

□ Usual  residence  elsewhere 

□ Armed  Forces 

□ Other  (Specify) 


□ Demolished 

□ In  sample  b; 

□ Eliminated 


Interview  not  obtained 


Oomoents  on  non- interview 


17.  Signature  of  Interviewer: 


I “■ 


Special  instructions  < 


EDITING  RECORD  FOR  OFFICE  USE  ONLY 


a.  Result  of  edit 

b.  Type  of  follow-up 

d.  Edited 

e.  Re-edited 

f.  Re-edited 

1 1 Passed 

□ Passed  (EPQ) 
1 1 palled  - no 

follow-up 

1 1 Failed  - 

follow-up 

1 1 Office  telephone 

1 1 Interviewer  telephone 

1 1 Personal 

Editor 

Editor 

Editor 

Date 

Date 

Date 

c.  Result  of  follow-up 

1 1 Completed  1 1 Non- interview 

(a)  that  is  the  name  of  the  head  of  this  household?  (Enter 

(b)  Bhat  are  the  names  of  all  other  persons  *h>  live  here? 
usually  live  here,  and  all  persons  staying  here  who  hav< 
residence  elsewhere.  List  these  persons  In  the  prescrll 

(c)  Do  any  (other)  lodgers  or  roomers  live  here?  □ No 

(d)  Is  there  anyone  else  who  lives  here  who  is  , — , „ 

now  away  on  business?  On  a visit?  Tempo-  L-1  1,0 

rarily  in  a hospital? 

(e)  Is  there  anyone  else  staying  here  now?  □ No 

(f)  Do  any  of  these  people  have  a borne  elsewhere? 

□ No  (leave  on  questionnaire)  □ Yes  (If  r 


□ Yes  (List)- 

□ Yes  (Ust)- 


3.  Race  (Check 


your  last  birthday? 


; you  boro?  (Record  state  or  foreign  country) 
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MEDICAL  CARE 

“•  ete- alk 

□ lea  □ No  (skip 

to  q. 20) 

"”  :rr;:"Tjr£  ?jl> 

At  hoae 

Other  (Specif,) 

„.t  did  you  hat.  done  oo  the  J iiSSd  } .i.lt  ,or  telephone  cll,7 

«>« 

) (3) 

§L““ 

8SH2r- 

□ less  than  1 no.  O deter 

DENTAL  CARE 

U‘  <*f)^.We*k0r  thC  ""k  bCf0re  ^ “,y00elD  ^ f“ilj  ^ t0adeDtl8t?  to*0™  Cl8e? 



BBB^redeeotbdt 

a.  do.  loot  bj  It  tx»  dine.  T«  ~.t  to  . dentist? 

Mo.  or  Tra. 

24.  Id  there  antone  Id  tie  fa«llj  *o  had  lodt  all  of  hid  teeth? 

Ole.  Ddo 

HOSPITAL  CARE 

25.  (.»  OUIUNC  ™ MdTHS^ae  anyone  In  the  f-1  It  been  a patient  to  a 

!□  Yes  (Table  11)  □ No 

26.  (a,  “th8  "I"”i  ‘n  ‘ 

(b)  Ho*  aany  tines  were  you  la  a narslnc  bane  or  sanitarln? 

O tee  (table  II)  a Ho 

27.  During  the  past  12  sooths  in  *hich  group  did  the  total  inccae  of  your  f«ily  fall, 
that  is.  your' s,  your  --'s.  etc.?  (Show  card  H)  Include  lncnae  froa  all  sources, 
such  as  vages,  salaries,  rents  froa  property,  pensions,  help  froa  relatives,  etc. 

Croup  do. 

Card  G 

NATIONAL  HEALTH  SURVEY 

1.  Confined  to  the  house  all  the 

time,  except  i n emergenc ies. 

2.  Can  go  outside  but  need  the  help 

of  another  person  in  getting 
around  outside. 

3.  Can  go  outside  alone  but  have 

trouble  in  getting  around  freely. 

4.  Not  limited  in  any  of  these  ways. 

Card  H 

NATIONAL  HEALTH  SURVEY 

Family  Income  during  past 
12  months 

1.  under  $500  (Including  loss) 

2.  $500  - $999 

3.  $1,000  - $1,999 

4.  $2,000  - $2,999 

5.  $3,000  - $3,999 

6.  $4,000  - $4,999 

7.  $5,000-  $6,999 

8.  $7,000  - $9,999 

9.  $10,000  and  over. 

Card  E 

NATIONAL  HEALTH  SURVEY 

For: 

Children  from  6 to  16  years  old  and 
others  going  to  school 

1.  Cannot  go  to  school  at  all  at 

present  time. 

2.  Can  go  to  school  but  limited  to 

certain  types  of  schools  or  in 
school  attendance. 

3.  Can  go  to  school  but  limited  In 

other  activities. 

4.  Not  limited  in  any  of  these  ways. 

Card  F 

NATIONAL  HEALTH  SURVEY 
For:  Children  under  6 years  old 

1.  Cannot  take  part  at  all  in  ordinary 

play  with  other  children. 

2.  Can  play  with  other  children  but 

limited  in  amount  or  kind  of  play. 

4.  Not  Tlmited  in  any  of  these  ways. 

Card  C 

NATIONAL  HEALTH  SURVEY 

For: 

Workers  and  other  persons  except 

1.  Cannot  work  at  all  at  present. 

2.  Can  work  but  limited  in  amount 

or  kind  of  work. 

3.  Can  work  but  limited  in  kind  or 

amount  of  outside  activities. 

4.  Not  limited  In  any  of  these  ways. 

Card  D 

NATIONAL  HEALTH  SURVEY 
For:  Housewife 

1.  Cannot  keep  house  at  all  at 

present. 

2.  Can  keep  house  but  limited  in 

3.  Can  keep  house  but  limited  in 

outside  activities. 

4.  Not  limited  in  any  of  these  ways. 

Card  A 

NATIONAL  HEALTH  SURVEY 
Check  List  of  Chronic  Conditions 

1.  Asthma  16.  Kidney  stones  or  other 

2.  Any  allergy  kidney  trouble 

4.  Chronic  bronchitis  18.  prostate  trouble 

5.  Repeated  attacks  of  sinus  trouble  19.  Diabetes 

6.  Rheumatic  fever  20.  Thyroid  trouble  or 

7.  Hardening  of  the  arteries  goiter 

8.  High  blood  pressure  21.  Epilepsy  or  convulsions 

9.  Heart  trouble  of  any  kind 

10.  Stroke  22.  Mental  or  nervous 

11.  Trouble  with  varicose  veins  trouble 

12.  Hemorrhoids  or  piles  23.  Repeated  trouble  with 

13.  Gallbladder  or  liver  trouble  back  or  spine 

14.  Stomach  ulcer  24.  Tumor  or  cancer 

15.  Any  other  chronic  25  Chronic  skin  trouble 

stomach  trouble  26.  Hernia  or  rupture 

Card  B 

NATIONAL  HEALTH  SURVEY 
Check  List  of  Impairments 

l.  Deafness  or  serious  trouble  with  hearing. 

3.  Condition  present  since  birth,  such  as  cleft  palate  or 

club  foot. 

4.  Stammering  or  other  trouble  with  speech. 

5.  Missing  fingers,  hand,  or  arm. 

6.  Missing  toes,  foot,  or  leg. 

7.  Cerebral  palsy. 

8.  Paralysis  of  any  kind. 

9.  Any  permanent  stiffness  or  deformity  of  the  foot  or  leg, 

fingers,  arm,  or  back. 
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APPENDIX  II 

ESTIMATING  EQUATIONS 


In  the  National  Health  Household-Interview 
Survey,  the  following  algebraic  statements  sum- 
marize the  estimation  process  for  X an  estimate 
of  X,  a population  characteristic. 


Let  P,  .. 
hij 


be  the  probability  of  selecting  the 
j1*1  PSU  in  the  i**1  stratum  in  the 
hth  Tab  Area,  with  P,  ..  = Ahij  , 

hlJ  a- 


and  let  A 


chij 


where  A^  is  1950  population  of 
the  hi/*1  PSU  and  A^  = 1950  popu- 


th 


lation  of  the  hi  stratum. 


be  number  of  persons  in  the  c 


th 


th 


color-residence  group  in  the  hij 
PSU  according  to  the  1950  Census 


Then 


where  L,  is  number  of  nonself- 
h 

representing  strata  in  the  hth  Tab 

Area,  v"h  is  an  estimate  of  the 

number  of  persons  in  the  cth  color- 
residence  group  in  the  nonself- 

representing  strata  in  the  hth  Tab 
Area. 

The  quantity  v£h  is  the  corresponding  1950  Census 
count.  If  next,  Xyac^  is  the  sample  aggregate  of 
the  X-measure  for  the  a1*1  age-sex-color  class  in 
the  cth  color-residence  group  in  the  nonself-rep- 
resenting  strata  in  the  hth  Tab  Area,  and  X , is 


the  corresponding  aggregate  for  self -representing 

strata  in  the  hth  Tab  Area,  and  if  further  f,  is  the 

h 

over-all  sampling  fraction  for  the  hm  Tab  Area, 
then 


X 


ah 


z 


^uach  + ^vach  Vch 


is  a first-stage  ratio  estimate  of  the  characteris- 
tic for  the  ath  age-sex-color  class  in  the  hth  Tab 
Area. 

In  precisely  the  same  manner,  an  estimate  of 
Za^,  the  current  population  of  the  a**1  age-sex- 
color  class  in  the  h**1  Tab  Area  is  calculated  as 


Z 


ah 


z 


The  total  first-stage  ratio  estimate  for  the 
a1*1  age-sex-color  class  is  for  the  X-measure: 

Xa  =ZXah'and 
h 

for  population: 

zl  -Z<h 


The  final  second-stage  ratio  estimate  of  the 


total  X-measure  is  X 


' -£ 


a Z . where  Z 


a Z 


is  the  independent  current  population  estimate  for 
the  a1*1  age-sex-color  class. 
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APPENDIX  III 

SAMPLING  AND  MEASUREMENT  ERRORS 


.Sampling  Error  — Basic  Formulation 

One  of  the  attractive  features  of  probability 
sampling  designs  is  their  inherent  quality  which 
permits  determination  of  tolerance  limits  within 
which  lie  findings  from  the  survey.  More  specifi- 
cally, for  such  designs,  it  may  be  determined,  with 
any  specified  degree  of  confidence,  what  the  maxi- 
mum differences  are  between  results  from  the 
sample  and  those  which  would  be  found  in  a com- 
plete enumeration  conducted  under  identical  con- 
ditions. 

In  simple  designs,  determination  of  sampling 
variance  most  commonly  is  made  in  a series  of  3 
steps:  (1)  An  exact  sampling  variance  formula  for 
the  design  is  derived  mathematically  in  terms  of 
(unknown)  population  parameters.  (2)  Sample  data 
for  individual  units  are  used  to  estimate  the  needed 
but  otherwise  unknown  parameters.  (3)  The  esti- 
mated parameters  are  substituted  into  the  derived 
formula,  and  sampling  variances  evaluated. 

For  more  complex  designs  such  as  that  of  the 
health  survey,  the  procedure  just  outlined  is  usually 
not  feasible  or  efficient,  even  when  required  form- 
ulas have  been  derived.  Different  methods,  out- 
lined in  this  section  of  Appendix  III  and  described 
in  somewhat  greater  detail  in  the  next,  are  used  in 
the  health  survey. 

The  fundamental  rationale  of  these  methods  is 
simple  and  applies  to  all  probability  designs.  All 
observations  of  a characteristic  x are  distributed 
randomly  into  m groups  of  k observations  each. 
Each  group  permits  making  an  estimate  of  ap- 
proximately— th  part  of  the  population  total,  by 


variance  of  the  group  estimates: 


This  general  scheme  of  estimation  has  been  rec- 
ognized by  a number  of  statisticians.  For  example, 
Deming4  speaks  of  it  as  the  Tukey  plan;  Hansen, 
Hurwitz,  and  Madow2  and  others  describe  it  as  the 
random  group  method.  It  is  being  used  more  wide- 
ly as  electronic  computers  make  it  more  practi- 
cable. 


Sampling  Error  Functions 

The  picture  just  sketched  needs  to  be  further 
highlighted  in  two  important  respects.  In  the  health 
survey,  attention  usually  is  centered  on  estimates 
of  aggregates  or  on  estimates  of  ratios  of  two  es- 
timated aggregates.  In  either  case,  since  simple 
estimated  aggregates  are  obtained  as  ratios  of  an 
estimated  statistic  to  estimated  population,  the  ul- 
timate estimate  is  a ratio,  say  R'  , of  two  other 

estimates,  say,  Y'  and  X' . Under  the  heading  "Es- 
timating Sampling  Variances  From  Survey  Data," 
beginning  on  page  27,  a procedure  for  determining 

variance  of  a quantity  X'  or  Y'  is  presented.  An 
entirely  analogous  procedure  yields  the  covariance 

of  X'  with  Y'.  Finally,  rel-variance  of  the  estimate 


a sample  design  which  is  essentially  the  same  as 
the  over-all  design.  Thus  if  X^  is  the  estimate 
from  the  g^  group,  x'  the  mean  of  the  m values 


R.'  is  obtained  from  the  equation: 

VR'  ■ VX'  + VY'  - 2VX'  Y*  ’ 


X’  , and  x'  is  the  over-all  estimate,  then 
g 

m 

X'  = 2 X*  , and  the  sampling  variance  of 

g=l  g 

i 2 2 2 

X is  S^'  = m S^'  , where  S^'  is  estimated 

g g 


where  the  V-symbols  represent  relative  variances 
and  covariance  of  the  subscript  variables. 

Thus  the  procedure  can  give  variance  for  any 
aggregate  or  ratio.  In  the  health  survey,  thousands 
of  different  estimates  are  being  made.  Even  with 
high-speed  computers,  the  cost  of  calculating  var- 
iances for  each  separate  estimate  would  be  pro- 
hibitive. Further,  such  a step  would  be  undesirable 
in  that  it  would  yield  estimated  variances  which, 
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because  of  their  own  sampling  error,  would  appear 
at  times  to  be  inconsistent  among  themselves.  For 
these  reasons,  either  one  of  two  courses  is  followed 
in  Survey  publications.  In  one  of  these,  variances 
are  calculated  for  only  a few  key  items,  and  the 
reader  is  allowed  to  infer  from  these  the  order  of 
variance  for  other  items. 

In  the  second  course,  a group  of  variables 
having  certain  common  characteristics — such,  for 
example,  as  being  binomial  variates— but  differ- 
ing in  absolute  size,  are  used  to  establish  a fitted 
curve  which  expresses  a "law"  of  variance  for 
variables  of  the  class.  The  fitted  curve  usually 

takes  the  form  v^  < = a + ~?  , where  X'  is  an 

2 

estimate,  v^/  , its  relative  sampling  variance,  and 

a and  b are  constants  of  the  fitted  curve.  It  is  these 
readings  from  the  curve  which  are  used  as  best 
estimates  of  the  variances. 


Estimating  Sampling  Variances 
From  Survey  Data 


For  calculation  of  variances  from  sample  data, 
the  universe  is  divided  into  the  4 sectors  displayed 
in  table  1.  The  contribution  of  each  sector  to  over- 
all variance  is  computed  separately. 

For  sectors  I and  III  the  sampling  ratio  in  the 
first  stage  of  selection  is  unity.  Accordingly,  the 
between-PSU  component  of  variance  for  these  sec- 
tors is  zero. 

Table  1.  Sectors  for  use  in  calculating 
variances  for  a calendar  quarter 


Sector 

number 

Sector  name 

Number 

of 

PSU' s 

Number 
of  seg- 
ments 

I 

Self-represent- 
ing  SMA' s 

77 

703 

II 

Nonself -repre- 
senting SMA' s 

13 

63 

III 

Self-represent- 
ing  urban  and 
rural  PSU's 

28 

90 

IV 

Nonself  repre- 
senting urban 
and  rural  PSU1 s- 

234 

692 

The  general  scheme  of  estimating  within -PSU 
variance  for  these  sectors  is  the  random  group 
method  previously  mentioned.  It  will  be  illustrated 
for  sector  I,  the  self-representing  Standard  Met- 
ropolitan Areas. 


The  segment  is  the  unit  of  sampling  within  the 
PSU's,  and  accordingly  is  made  the  basis  of  calcu- 
lations of  within-PSU  variance.  The  703  segments 
in  the  sector  are  divided  into  8 groups  whose  mem- 
bership of  approximately  88  segments  each  is  ran- 
domly determined.  The  selection  process  is  con- 
trolled so  that  each  group  has  its  proportionate 
part  of  each  of  the  types  of  segments  in  the  sector. 
The  numbers  of  segments  by  types  are: 


Type 

Number  of 
segments 

Total 

703 

Central  City 

354 

Urban  fringe 

156 

Other  urban  places 

30 

Rural 

84 

New  construction  areas 

79 

Inflated  totals  for  a characteristic  Y for  each  group 
are  established,  with  the  summarizing  operations: 


k 


i=l 


where  Y'  . is  the  estimate  for  that  part  of  the  uni- 

g1 

verse  which  is  represented  by  the  l segment  in 
the  group,  and  k is  the  number  of  segments  in 
the  g**1  group.  The  variances  of  and  of  Y'  are 
calculated  as  are  those  for  and  X'  respectively 
on  page  26,  so  that 


1 

m-1 


X 

g=l 
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g=l 


The  contribution  of  sector  III  is  calculated  in 
the  same  manner. 

For  the  nonself-representing  sectors,  an  ulti- 
mate cluster  technique  is  employed  in  calculating 
variances.  Further,  since  there  is  but  1 PSU  in 
each  stratum,  the  strata  are  grouped  into  pairs, 
placing  similar  strata  in  the  same  pair.  This  proc- 
ess is  described  as  a collapsed  strata  technique . 
It  is  illustrated  for  the  nonself -representing  urban 
and  rural  PSU's. 

Data  for  the  234  PSU's  are  consolidated  into 


117  pairs,  Y^  being  the  estimated  total  for  that 
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part  of  the  population  represented  by  the  gth  pair 
and  Yg.  being  that  part  of  the  total  estimated  by 

the  i^PSU  in  the  g^  pair.  Variance  of  Y' . is  es- 


timated as 

L 

4'.  ■ ITT  2 <Ygi 

81  « Si 


.th 


where  P^.  is  1950  population  of  i PSU,  P^  is 

1950  population  of  g^1  group  and  is  the  number 

of  PSU's  in  each  group.  Since  all  groups  contain 
2 PSU's,  L is  a constant  equal  to  2.  Further,  since 
g 

Y = Y i + Y 2 . the  sampling  variance  of  Y*  is 


Sy'  = 2 sfy  , 

g gi 

and  the  total  for  the  sector,  Y*  , has  variance 
117 

si 


^ - z 


g=l 
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The  same  procedure  is  used  for  sector  II. 

Variance  of  the  survey  totalis  simply  the  sum 
of  the  variances  for  the  4 sectors. 


Measurement  Error 


Measurement  error  can  be  divided  into  com- 
ponents in  a variety  of  ways.  One  useful  scheme  is 
to  separate  it  into  bias  and  nonsampling  variance. 
Nonsampling  variance  has  in  turn  many  compo- 
nents. Among  these  are  variations  which  have  their 
source  in  respondent,  interviewer,  classifier,  edi- 
tor, or  tabulator.  The  method  of  estimating  sam- 
pling variance  which  is  used  in  the  health  survey 
includes  most  of  the  measurement  variations,  al- 
though it  does  not  include  those  components  of 
variation  which  are  unaffected  by  the  size  of  the 
sample.  With  some  exceptions  (found  in  edits  for 
consistency),  the  biases  of  measurement,  from 
whatever  source,  are  not  treated  in  the  present 
report. 

The  main  text  of  the  report,  on  page  19,  lists 
several  routes  being  taken,  all  intended  to  im- 
prove evaluation  of  measurement  error.  An  ulti- 
mate goal  is  establishment  of  a model  for  analyz- 
ing over -all  error  and  its  components,  and  for 
guidance  toward  efficient  use  of  resources  in  min- 
imizing total  error. 
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APPENDIX  IV 

STRATIFICATION  OF  PRIMARY  SAMPLING  UNITS 


Principles 

A twin  objective  of  many  sampling  designs  is 
a pattern  in  which  individual  primary  sampling 
units  are  as  internally  heterogeneous  as  possible, 
and  in  which  each  stratum  formed  from  grouped 
PSU's  is  as  homogeneous  with  respect  to  PSU's  as 
possible.  Said  in  another  way  this  means  that  ulti- 
mate sampling  units  within  a PSU  should  tend  to  be 
unlike  one  another,  but  that  PSU's  within  a stratum 
should  tend  to  be  alike  one  another.  This  twin  ob- 
jective was  sought  in  the  Health  Household-Inter- 
view Survey. 

Three  broad  specifications  of  the  survey  molded 
the  main  outlines  of  the  modes  of  stratification  in 
the  NHS.  These  were:  (1)  The  requirements  of  end 
product  which  were  that  separate  estimates  be  pre- 
pared for  major  Standard  Metropolitan  Areas,  for 
a number  of  geographic  sectors  of  the  country,  and 
for  differing  densities  of  population  (metropolitan, 
other  urban,  and  rural).  (2)  For  administrative 
reasons,  and  in  order  to  minimize  operating  costs, 
stratification  in  the  NHS  was  to  be  coincident,  inso- 
far as  feasible,  with  that  of  the  Census  Bureau's 
Current  Population  Survey.  (3)  The  general  charac- 
terization of  the  stratifying  process  was  that  it 
produce  relatively  homogeneous  socioeconomic 
classes  of  PSU's — with  this  term  being  further 
interpreted  to  reflect  geographic  location,  density 
of  population,  rate  of  population  increase  between 
1940  and  1950,  proportion  of  nonwhite,  type  of  in- 
dustry in  predominantly  urban  areas,  and  type  of 
farming  in  the  rural  areas. 

Within  these  specifications,  the  approximately 
1,900  PSU's  were  classified  into  372  strata,  the 
following  rules  serving  as  principal  further  guides 
in  the  process.  In  each  case  the  rule  is  presented 
as  a positive  statement,  although  obviously  there 
had  to  be  some  compromises  among  rules  in  order 
to  produce  a desirable  result. 

1.  Except  where  a single  PSU  was  larger  than 
an  average  stratum — size  being  measured  here  as 
elsewhere  in  the  stratification  process  by  1950 
population — strata  were  of  approximately  the  same 
size.  This  meant  about  300,000  persons  to  a stratum. 

2.  Since  the  general  design  called  for  sample 
selection  of  a single  PSU  from  each  stratum  with 
probability  proportionate  to  size,  each  PSU  with 
the  population  above  a lower  cutoff  became,  by  it- 
self, a self -representing  stratum.  The  effect  of  all 


rules  was  to  set  this  cutoff  at  400,000  (1950  pop- 
ulation). 

3.  Also  included  as  self -representing  or  cer- 
tainty areas  were  any  Standard  Metropolitan  Areas 
with  the  population  somewhat  less  than  the  cutoff, 
but  within  100  miles  of  an  SMA  above  the  cutoff. 
The  rationale  was  that  the  same  field  organization 
which  served  the  larger  city  could  also  serve  the 
other,  and  thus  reduce  costs. 

4.  Solution  of  the  allocation  problem  (page  31) 
led  to  the  conclusion  that  a nonself-representing 
Tab  Area— that  is  a Tab  Area  not  made  up  entirely 
of  self-representing  PSU's — should  contain  not 
less  than  4 sample  PSU's  if  it  were  a Tab  Area  of 
Standard  Metropolitan  Areas,  and  not  less  than  8 
sample  PSU's  otherwise.  This  meant  in  turn  that 
such  Tab  Areas  would  contain  corresponding 
minimum  numbers  of  strata  and  this  fact  influenced 
ultimately  the  number  of  different  strata  which 
were  formed. 

5.  Since  end-product  specifications  required, 
for  purposes  of  comparative  analysis,  both  urban 
and  rural  Tab  Areas  within  each  geographic  section, 
it  was  decided  to  make  the  first  stage  of  sample 
selection  identical  for  the  other  urban  and  the  rural 
Tab  Area  within  the  section.  Thus,  each  PSU  drawn 
from  other  than  Standard  Metropolitan  Areas  be- 
came the  first-stage  unit  for  1 urban  Tab  Area  and 
1 rural  Tab  Area,  and  2 sets  of  ultimate  stage  units 
or  segments  — 1 for  each  Tab  Area — were  drawn 
from  each  such  PSU.  This  step  had  to  be  taken  into 
consideration  later  in  calculating  variances,  since 
first-stage  selection  for  these  Tab  Areas  was  not 
independent. 

6.  Stratification  proceeded  in  a sequential  man- 
ner: tentative  classification  with  respect  to  1 
major  specification  or  rule  being  followed  by  ten- 
tative subclassification  by  a second  rule  and  then 
by  further  subclassification  by  a third.  As  the  proc- 
ess continued,  occasional  changes  in  the  first  ten- 
tative classifications  had  to  be  made.  After  semi- 
final stratification  was  completed,  there  was  a re- 
view of  results,  and  a few  subjective  changes  made 
which  reviewers  thought  would  increase  socio- 
economic homogeneity  between  PSU's  within  strata. 
This  introduction  of  judgment  in  the  stratifying 
phase  of  the  survey  could,  of  course,  produce  no 
bias.  If  it  was  well  done,  it  reduced  sampling  vari- 
ance; if  it  was  poorly  done,  at  worst  it  would  in- 
crease variance. 
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Results 


As  indicated,  the  principles,  specifications,  and 
rules  led  to  a classification  of  the  approximately 
1,900  PSU's  into  372  strata.  Of  these,  110  are  com- 
posed of  a single  self-representing  PSU.  Collec- 
tively, these  110  strata  represent  52  percent  of 
the  population  in  the  universe.  For  them  there  is 
no  between-PSU  component  of  variance.  The  re- 
maining 262  strata  vary  a great  deal  among  one 
another,  some  being  metropolitan,  some  urban, 
some  rural,  and  all  obviously  exhibiting  still 
other  differing  features  as  a consequence  of  the 
stratification.  Even  so,  3 examples  of  actual  strata 
formed  may  contribute  to  a "feel"  for  the  nature 
of  nonself-representing  strata  in  the  health  survey. 


Example  A.  Sparsely  populated  stratum 


PSU's 

(defined  by  counties) 

Preliminary 
1950  population 

Total 

254,235 

Coconino,  Ariz. 

23,755 

Dona  Ana,  N.  Mex . * 

39,044 

Graham,  Ariz. 

13,018 

San  Juan,  N.  Mex. 

18,116 

Valencia,  N.  Mex. 

22,574 

Navajo,  Ariz. 

29,263 

Uintah,  Utah 

10,259 

Alamosa-Costilla,  Colo. 

16,572 

Miner al-Rio  Grande, 

Colo. 

13,330 

Montezuma,  Colo. 

9,937 

Montrose,  Colo. 

15,024 

Pinal,  Ariz. 

43,343 

In  each  of  the  three  examples,  the  starred  PSU 
represents  the  stratum  in  the  sample. 


Example  B.  Moderately  densely  populated 
non-Metropolitan  stratum 


PSU's 

(defined  by  counties) 

Preliminary 
1950  population 

Total 

315,623 

Harr ison-Heard-Tr oup , 

r>  „ * 

68,008 

112,208 

45,580 

89,827 

Ijci  • -1- 

Florence-Marion,  S.  C. 

Baldwin- Jones-Twigg,  Ga . -- 
Calendar- Sumter , S.  C. 

Example  C.  A nonself-representing 
SMA  stratum 


Standard 

Preliminary 

Metropolitan  Area 

1950  population 

Total — ---- 

301,706 

Springfield,  Mo. 

104,118 

Sioux  City,  Iowa 

103,959 

St.  Joseph,  Mo.* 

93,629 
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APPENDIX  V 

THE  SAMPLING  ALLOCATION  PROBLEM 


Leading  Considerations. 

A fundamental  fact  which  conditions  the  de- 
sign of  a multipurpose  survey  and  the  allocation  of 
resources  is  that  no  single  factor  will  determine 
uniquely  the  design,  but  rather  a balance  must  be 
sought  taking  into  consideration  leading  objectives. 
In  planning  the  Health  Household-Interview  Sur- 
vey, leading  considerations  were  identified  as  fol- 
lows. 

1.  The  survey  was  expected  to  provide  sepa- 
rate estimates  for  a number  of  geographic  sections 
and  for  metropolitan,  urban,  and  rural  sectors. 
This  condition  was  converted  initially  to  a provision 
that  separate  worksheet  estimates  be  produced  for 
each  of  the  defined  41  Tab  Areas,  although  the  Tab 
Areas  would  be  consolidated  into  a lesser  number 
of  groups  for  most  purposes. 

2.  A household  survey  was  predicated,  which 
in  the  United  States  ordinarily  means  a multistage 
area  design. 

3.  Tentative  determination  had  been  reached 
as  to  target  sampling  tolerances  for  estimates 
which  were  to  come  from  the  survey. 

4.  Preliminary  study  of  requirements  and  re- 
view of  probable  administrative  and  operating  costs 
strongly  suggested  that  initially  the  structure  of 
the  health  survey  should  parallel  in  large  measure 
the  Current  Population  Survey  (CPS)  which  was 
also  a general-purpose  survey  of  households.  Sig- 
nificant savings  might  be  possible  if  the  2 surveys 
were  companion  undertakings. 

5.  The  survey  was  to  be  a continuing  activity, 
geared  to  production  at  quarterly  intervals  of  na- 
tional estimates  of  characteristics  of  high  incidence, 
and  production  of  other  statistics  for  the  Nation 
and  for  parts  of  the  Nation  at  annual  intervals. 

6.  Appropriations  set  budget  limitations  on  the 
design. 

Outline  of  Design  Solution 

The  specifications  suggested  that  equal  relia- 
bility be  sought  for  estimates  for  each  Tab  Area. 
The  target  tolerances  and  previous  design  experi- 
ence suggested  further  that  a multistage  survey 
could  be  designed  which  would  meet  requirements 
and  which  would  contain  a possible  700  to  1,200 
households  per  year  per  Tab  Area. 


Experience  with  CPS  indicated  that  a total  of 
300  or  more  strata  with  1 sample  PSU  in  each 
stratum  were  desirable.  Since  the  principle  had 
been  adopted  that  the  2 surveys  were  to  be  com- 
panion activities,  and  since  the  CPS  was  operating 
with  330  strata,  it  was  decided  as  a first  step  to 
adopt  tentatively  the  CPS  stratification  for  the  NHS. 
This  tentative  decision  was  reviewed  and  modified 
in  a later  step. 

The  budget  factor  was  now  introduced.  For  the 
tentative  design,  which  was  beginning  to  shape  up, 
it  seemed  that  about  36,000  households  per  year, 
or  a little  under  900  per  year  per  Tab  Area,  was 
feasible. 

At  this  point,  the  precision  requirements  for 
each  tabulation  area  were  considered  in  terms  of 
the  components  of  variance.  The  set  of  strata  for 
CPS  in  each  tabulation  area  was  examined  to  see 
if  they  were  adequate  to  meet  precision  require- 
ments for  the  Tab  Areas.  In  the  areas  in  which  the 
minimum  stratum  requirements  did  not  appear  to 
be  satisfied,  additional  strata  were  created,  thus 
bringing  more  PSU's  into  the  sample.  In  some 
cases  this  was  accomplished  by  splitting  an  ex- 
isting stratum  into  2 parts,  letting  the  PSU  which 
is  in  the  Current  Population  Survey  represent  the 
part  of  the  stratum  in  which  it  falls  and  selecting 
a new  PSU  in  the  other  part.  In  other  cases,  it 
was  necessary  to  rearrange  some  strata  to  pre- 
vent great  variation  in  strata  sizes  or  in  the  urban - 
rural  composition  of  a stratum.  In  such  cases  new 
PSU's  were  selected,  and  as  a result  69  of  the 
PSU's  for  the  CPS  are  not  included  in  the  NHS.  An 
additional  111  PSU's  not  in  the  CPS  were  selected 
for  the  NHS  sample. 

A principal  tool  utilized  in  carrying  out  the  anal- 
ysis indicated  in  the  previous  paragraph  is  ex- 
pressed in  the  approximate  relationship 

vx-  ■ vi + < • where 

m n 

2 

V is  between-PSU  rel-variance  in  the  population, 
B 
9 

V,.,  is  within-PSU  rel-variance  in  the  population, 
w 
2 

Vx/  is  sampling  rel-variance  of  an  estimated  char- 
acteristic 

m is  the  number  of  PSU's  in  the  sample  for  a Tab 
Area,  and 
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n is  the  number  of  households  in  the  sample  for  a 
Tab  Area 

2 > 2 

Values  of  Vgand  were  calculated  for  a number 

of  household  statistics  from  the  CPS  and  other 
2 

surveys.  and  n were  set  from  first  appropria- 
tions set  by  joint  consideration  of  target  tolerances 
and  budget.  For  each  of  the  several  household  sta- 
tistics a value  of  m was  calculated,  using  the  above 
equation,  for  nonself-representing  strata.  Using 
"typical"  solutions,  this  step  determined  the  needed 


number  of  PSU's  in  each  Tab  Area  and  consequently 
the  number  of  strata  which  should  be  established, 
since  1 PSU  was  to  be  drawn  from  each  stratum. 

Result 

The  consequence  of  these  actions  is  the  health 
survey  sample  design,  which  was  planned  to  have 
372  strata,  372  PSU's,  41  Tab  Areas,  and  36,000 
households  with  115,000  persons  in  it  each  year. 

As  noted  elsewhere  in  the  report,  the  original 
allocation  of  resources  will  be  modified  as  con- 
sumer interest  and  experience  dictate. 
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APPENDIX  VI 


ILLUSTRATION  OF  DRAWING  PSU’S  AND  HOUSEHOLDS 
INTO  THE  SAMPLE 


Selection  of  Primary  Sampling  Units 

Section  5 of  this  report  outlines  the  main  fea- 
tures of  sample  selection  in  the  health  survey. 
This  Appendix  illustrates  the  principal  steps  of  that 
process. 

Assume  a particular  stratum  contains  4 pri- 
mary sampling  units,  or  PSU's.  These  are  listed, 
together  with  their  1950  population,  and  cumulated 
population,  as  in  table  2. 


Table  2.  Primary  sampling  units  in  stratum 
number  428 


PSU 

1950 

popula- 

tion1 

Cumula- 

tive 

1950 

popula- 

tion 

Cedar  Rapids,  Iowa, 

SMA 

104,000 

104,000 

Lincoln,  Nebr.,  SMA 

118,000 

222,000 

Topeka,  Kans.,  SMA 

104,000 

326,000 

Waterloo,  Iowa,  SMA 

99,000 

425,000 

^Preliminary  and  approximate  population  figures 
are  used  in  this  example. 


A random  number  between  1 and  425,000  is  select- 
ed. Assume  the  number  is  301,265.  This  number 
selects  Topeka,  Kans.,  as  the  sample  PSU  from 
stratum  428.* 


In  three  respects,  the  example  is  a streamlined 
ve r s ion  o f det ai led  selection.  (1)  Where  the  stratum 
in  the  health  survey  and  in  the  Current  Population 
Survey  were  ident ical,  the PSU  drawn  earlier  for  the 
CPS  was  used  also  in  NHS.  (2)  Where  a CPS  stratum 
was  divided  into  2 strata  in  NHS,  an  unbiased  selec- 
tion procedure  retained  the  CPS  PSU  for  one  of  the 
new  strata.  (3)  Those  PSU’s  which  are  found  also  in 
CPS  were  selected  initially  with  probability  pro- 
portional to  size,  and  also  under  restrictions  of 
the  Goodman-Kish  controlled  selection  technique 
which  increases  the  probabilities  of  selection  for 
preferred  combinations  of  units.' 


Selection  of  Enumeration 
Districts  and  Segments 

The  exact  procedure  for  selecting  segments 
varies  depending  on  whether  the  Tab  Area  involved 
is  a Standard  Metropolitan  Area,  an  "other  urban" 
area , or  a rural  area , but  the  nature  of  the  procedure 
is  the  same  for  all  areas.  It  will  be  described  for  a 
typical  metropolitan  Tab  Area  for  which  not  all 
first-stage  sampling  units  were  self-representing; 
i.e.,  for  a Tab  Area  in  which  there  is  more  than  1 
PSU  in  the  sample.  In  following  this  selection  proc- 
ess it  is  useful  to  remember  that  the  final  sample 
of  households  and  persons  is  intended  to  be  self- 
weighting within  the  Tab  Area,  which  means  that 
every  household  in  the  Tab  Area  has  an  equal 
chance  of  being  selected. 

Assume  that  this  Tab  Area  has  5 PSU's  in  the 
sample,  3 of  which  are  self -representing,  and  2 of 
which  are  not.  Since  the  over-all  design  has  an 
average  annual  sampling  rate  of  about  1 in  1,400 
and  since  144  segments  are  to  be  selected  from  the 
Tab  Area,  assume  this  typical  Tab  Area  contains 
an  estimated  200,000  segments  in  the  population 
(page  13).  More  precisely,  the  assumption  is  that 
the  Tab  Area  contains  200,000  size  measures, 
where  a size  measure  is  equal  to  6 households, 
and  the  number  of  size  measures  is  the  number  of 
households  in  1950  divided  by  6. 

The  first  step  is  to  allocate  the  144  sample 
segments  to  the  5 sample  PSU's.  This  is  done  in 
proportion  to  the  estimated  size  of  the  stratum 
represented  by  the  PSU.  For  example,  if  a partic- 
ular sample  PSU  contains  5,000  size  measures, 
and  was  drawn  from  a stratum  containing  25,000 
size  measures,  it  represents  those  25,000  size 
measures  in  the  sample  and,  therefore,  represents 
one-eig;hth  part  [25,000  divided  by  200,000]  of  the 
population  in  the  Tab  Area.  Therefore  1/8  of  144, 
or  18  segments  are  assigned  to  that  PSU.  In  order 
to  facilitate  continuous  sampling,  and  to  reduce 
costs  by  having  samples  in  adjacent  quarters  also 
geographically  neighboring,  4 quarterly  samples 
are  drawn  simultaneously,  as  sketched  in  the  next 
paragraph.  Accordingly,  the  18  segments  are  di- 
vided among  the  4 quarters,  so  that  either  4 or  5 
segments  will  appear  in  each  quarter. 
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The  next  step  is  to  localize  the  sample  into 
areas  smaller  than  the  PSU.  For  this  purpose  the 
enumeration  district,  or  ED,  is  utilized.  ED's, 
used  as  administrative  and  tabulating  cells  in  the 
1950  Census,  vary  greatly  in  size,  but  usually  con- 
tain not  less  than  10  or  more  than  150  size  meas- 
ures. Assume  in  this  illustration  that  the  selected 
PSU  with  5,000  size  measures  contains  50  ED's. 

For  the  PSU  of  the  example,  4 segments  will 
be  required  in  some  quarters  and  5 in  others.  The 
larger  of  these  numbers  is  identified  as  the  num- 
ber of  "starting  points."  Thus  in  this  PSU  there 
are  5 starting  points.  It  is  intended  that  these 
starting  points  be  distributed  randomly,  but  sys- 
tematically throughout  the  PSU  and  that  they  serve 
as  selectors  of  ED's  and  segments  for  the  first 
quarter.  This  is  done  in  the  following  manner.  The 
ED's  are  arranged  in  systematic  sequence  with  all 
central  city  ED's  listed  first,  followed  by  all  urban- 
ized fringe  ED's,  and  then  by  other  urban  ED's,  and 
finally  by  rural  ED's. 

The  first  starting  point  is  determined  by  choos- 
ing a random  number  between  1 and  1,000  [5,000 
size  measures  in  the  PSU  divided  by  5,  the  number 
of  starting  points].  Say  this  number  is  725.  Then 
that  listed  ED  which  contains  the  725th  cumulated 
size  measure  is  included  in  the  sample,  as  are  also 
ED's  with  the  1,725th,  2,725th,  3,725th,  and  4,725th 
cumulated  size  measures. 

Consider  the  ED  with  the  725th  size  measure. 
Suppose  it  contained  100  size  measures,  identified 
in  the  cumulated  listing  as  numbers  705  through 
804.  The  process  just  described  locates  the  first 
starting  point  then  not  only  in  this  particular  ED, 
but  at  the  21st  size  measure  [random  number  725, 
minus  705,  plus  1], 

Making  use  of  Sanborn*  and  other  detailed  maps , 
the  ED  then  is  "segmented"  on  a new  map  into 
100  units  approximately  equal  in  size  (i.e.,  in  the 
number  of  expected  households).  These  units  are 
numbered  consecutively  from  1 through  100  in  a 
systematic  fashion  beginning  with  a randomly  lo- 
cated start.  The  unit  or  segment  numbered  21, 
containing  an  expected  6 households,  becomes  a 
sample  segment  for  the  first  quarter  of  inter- 
viewing. This  same  procedure  is  carried  out  for 
other  choosen  ED's  in  the  PSU  and  for  other  sample 
PSU's  in  the  Tab  Area. 

It  will  be  noticed  that,  because  some  numbers 
are  not  exactly  divisible  by  others,  in  the  example 
PSU  5 rather  than  the  calculated  4.5  segments  are 
interviewed  in  the  first  quarter.  Memorandum  rec- 
ords are  maintained  so  that  over  the  Tab  Area 
exactly  1/4  of  144  or  36  segments  are  interviewed 
each  quarter. 


♦Published  by  the  Sanborn  Map  Co*,  New  York, 
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In  the  example  ED,  the  21st  segment  was  inter- 
viewed the  first  quarter.  For  the  second  quarter, 

the  [725  + — ’PQQ  ] = 975th  segment,  is  in  the  sample; 

the  1,225th  in  the  third  quarter,  and  the  1,475th  in 
the  fourth  quarter;  except  that  the  memorandum 
record  again  is  used  to  assure  that  only  18  seg- 
ments from  the  PSU  are  included  over  the  year. 
In  the  following  year,  segments  are  selected  in  such 
a manner  that  they  are  geographically  neighboring 
the  segments  in  the  first  sample  at  about  the  same 
time  of  the  year. 

Thus  it  is  that  over  the  year,  for  the  stratum 
from  which  the  example  PSU  comes,  the  probability 
that  any  segment,  household,  or  person  is  in  the 
sample  is  the  product  of  the  probability  of  selecting 
this  particular  PSU  (5,000  divided  by  25,000)  times 
the  probability  of  selecting  a particular  segment 
within  the  PSU  (18  divided  by  5,000);  or  in  other 
words  is  1/5  times  18/5,000,  which  is  0.00072. 
By  virtue  of  the  way  in  which  the  sample  was  dis- 
tributed, this  is  exactly  the  designed  over -all 
sampling  portion,  144/200,000,  for  the  Tab  Area. 
The  probability  for  any  person  from  the  example 
Tab  Area  appearing  in  a given  quarter  is  approx- 
imately 0.00018. 

Variations  of  Detail 

The  principles  of  selection  were  uniform 
throughout  the  survey.  Depending  upon  the  partic- 
ular areas  which  fell  into  the  sample  and  upon 
the  types  of  resources  available  for  those  areas, 
additional  steps  sometimes  were  taken  in  the  se- 
lection process.  For  example,  detailed  block  sta- 
tistics were  available  for  many  cities.  In  these 
cases,  a selection  of  blocks  proportional  to  size 
was  made  within  sample  ED's  before  making  a 
direct  selection  of  segments.  In  some  instances  a 
block  was  further  subdivided  and  subsampled  be- 
fore final  selection  of  segments.  If  it  was  found 
from  a Sanborn  map  or  other  source  that  the  pro- 
spective ultimate  sampling  unit  was  a large  apart- 
ment building,  still  another  stage  of  subsampling 
was  introduced  to  bring  the  final  unit  closer  to  an 
expected  6 households. 

In  some  cases,  the  selection  of  samples  in 
Washington  results  in  the  inclusion  of  a segment 
in  which  the  field  lister  or  interviewer  finds  many 
more  than  6 households.  This  may  occur  because 
of  new  construction  unknown  in  Washington,  or  be- 
cause sampling  materials  were  incomplete  or  in- 
accurate. In  instances  in  which  the  segment  ob- 
viously appears  to  contain  more  than  20  households, 
field  manuals  give  detailed  instructions  for  sub- 
sampling the  segment  and  interviewing  only  the 
subsample,  in  a manner  which  reduced  costs  but 
avoids  introduction  of  bias.  A price  of  slightly  high- 
er variance  is  paid  whenever  this  becomes  neces- 
sary. 
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APPENDIX  VII 

RANDOMIZING  ASSIGNMENTS,  AREAS,  AND  WEEKS 


Basic  samples  in  the  health  survey  are  drawn 
to  represent  the  population  of  the  United  States 
over  a calendar  quarter.  It  is  efficient  in  terms  of 
operating  procedures  and  reduction  of  variance, 
and  furthermore,  desirable  in  terms  of  potentially 
available  end  product,  to  make  each  week's  col- 
lection a random  sample  of  the  population.  This  is 
done.  The  randomization  of  assignments,  areas, 
and  weeks  is  quite  an  elaborate  process.  To  follow 
the  process  through  in  all  its  detail  most  readers 
would  find  tedious.  For  this  reason,  a description 
is  given  by  means  of  an  example  which  exhibits 
leading  features  of  the  process  while  omitting  a 
number  of  lesser  details. 

Dimensions  of  the  Problem 

For  administrative  reasons  (which  in  the  main 
are  consistent  with  minimum  costs)  a given  inter- 
viewer operates  within  a single  geographic  section 
—with  a few  exceptions— and  usually  within  from 
1 to  4 contiguous  PSU's.  Consequently,  randomiza- 
tion of  assignments,  areas,  and  weeks  was  carried 
out  separately  within  each  geographic  section. 
In  this  process  the  8 largest  SMA's  were  ex- 
cluded from  the  sections  and  treated  separately. 

There  are  11  sections  in  the  country,  each 
divided  into  3 tab  areas:  metropolitan,  other  urban, 
and  rural.  Each  Tab  Area  contains  36  segments  for 
the  sample  for  a quarter,  and  thus  a section  has 
108  segments  each  quarter.  There  are  a total  of 
120  interviewers  to  cover  a grand  U.  S.  total  of 
1,476  segments  per  quarter  (including  the  8 largest 
SMA's).  Thus,  on  the  average,  1 interviewer  covers 
12  segments  per  quarter.  Excluding  the  8 largest 
SMA's,  the  108  segments  per  quarter  in  a section 
require  an  average  of  9 interviewers  for  the  section. 
A typical  assignment  for  an  interviewer  for  a week 
is  2 segments  or  an  expected  12  households  to  be 
interviewed,  although  an  assignment  may  consist 
of  either  1 or  3 segments.  An  interviewer  may  or 
may  not  have  an  assignment  in  a given  week.  She  nev- 
er has  more  than  1 assignment  in  a week.  Thus,  the 
typical  situation  in  a section  over  a quarter  encom- 
passes 54  assignments,  3 tab  areas,  and  13  weeks, 
with  6 assignments  per  interviewer,  although  the 
assignments  per  interviewer  may  range  from  3 to 
13.  An  effort  is  made  to  provide  at  least  1 assign- 
ment to  each  interviewer  each  month,  in  order  to 


avoid  having  too  great  a time  lapse  between  inter- 
viewing experiences. 

The  objectives  of  intraquarter  arrangements 

are: 

1.  Obtaining  approximately  equal  representa- 
tion from  each  of  the  3 Tab  Areas  in  each 
section  in  each  week 

2.  Spacing  the  work  of  each  interviewer  at  ap- 
proximately even  intervals  over  the  quarter, 
and 

3.  Randomizing  assignments  (segments  to  be 
interviewed)  over  the  weeks  of  the  quarter. 

Principal  features  of  the  way  in  which  these 
objectives  are  reached  are  illustrated  in  the  fol- 
lowing numerical  example  of  a composite  geograph- 
ic section.  It  should  be  observed  that  there  is  no 
unique  way  of  accomplishing  the  objectives  and  that 
the  method  chosen  is  but  one  of  several  possible 
methods. 


Example 

This  geographic  section  contains  the  usual  3 
Tab  Areas:  metropolitan,  urban,  and  rural,  each  of 
which  has  36  segments  to  be  interviewed  over  the 
quarter.  Nine  interviewers  have  been  hired  for  work 
in  the  Census  Region  which  contains  the  section. 
The  Census  Regional  Offices,  of  which  there  are 
17,  have  indicated  for  each  of  the  interviewers  in 
which  of  the  20  PSU's  in  the  sample  in  the  section 
they  can  serve.  This  information  has  been  reported 
to  Washington  (table  3). 


Table  3.  Interviewer  service  areas 


Inter- 

viewer 

Can  serve  in  PSU(s) 
numbered 

A 

1,  2 

B 

3 

c 

4 

D 

5,  6,  7 

E 

8,  9 

F 

10,  11,  12 

G 

13,  14,  15 

H 

16,  17,  18 

J 

19,  20 
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Step  I 

Formation  of  assigments  in  each  PSU.  — The 
sample  segments  within  each  PSU  are  arranged  in 
sequence  by  degree  of  urbanization  and  grouped 
into  assignments  of  2 segments  each  (with  1 assign- 
ment containing  either  1 or  3 segments  if  the  total 
number  of  segments  is  odd).  The  purpose  of  the 
grouping  is  to  put  unlike  segments  in  the  same  as- 
signments and  to  obtain  balance  between  urban  and 
rural  Tab  Areas  in  the  assignments.  The  process 
is  illustrated  in  table  4 for  a non-SMA  primary 
sampling  unit  which  contains  6 segments;  the  seg- 
ments connected  by  a line  being  members  of  the 
same  assignment. 


Table  4.  Formation  of  assignments  in  PSU 
number  5 


Segment  number 


Urbanization 

classification 


5-1 


[ 


5-6 

5-2 

5-9 

5-14 


5-11 


Urban  segment 
Urban  segment 
Rural  segment 
Rural  segment 
Rural  segment 
Rural  segment 


Table  5.  Number  of  assignments  for  each 
interviewer 


Inter- 

viewer 

Number  of 
signments 

as- 

by  PSU 

Total  number 
of  assignments 

All 

SMA 

Non- 

SMA 

A 

1-2 

2-3 

5 

2 

3 

B 

3-3 

3 

3 

Ci 

4-1 

1 

1 

D 

5-3 

6-3 

7-2 

8 

3 

5 

E 

8-3 

9-4 

7 

7 

F 

10-4 

11-2 

12-3 

9 

3 

6 

G 

13-4 

14-3 

15-3 

10 

3 

7 

H 

16-2 

17-3 

18-2 

7 

2 

5 

J 

19-2 

20-2 

4 

2 

2 

'interviewer  C has  in  this  example  only  1 assign- 
ment in  the  quarter  for  this  section.  She  has  addi- 
tional assignments  in  other  PSU’s  in  a neighboring 
section  which  were  assigned  because  the  locations 
were  more  accessible  to  her  than  t o in te r v iewe r s from 
the  other  section. 


Table  6.  Spacing  interviewer  assignments 
by  week 


Thus  3 assignments  are  identified  for  this  PSU.  In 
PSU's  that  are  SMA,  the  arrangement  is  in  se- 
quence by  central  city-,  urban  fringe-,  other  urban 
places-,  and  rural-segments. 

Step  II 

Determination  of  number  of  assignments  for 
each  interviewer.— The  number  of  assignments  in 
each  PSU  having  been  determined,  the  number  of 
assignments  for  each  interviewer  is  established 
readily  by  reference  to  the  field  report  reflected 
in  table  3.  A new  worksheet, table  5, is  setup  com- 
bining these  two  pieces  of  information.  The  first 
figure  in  each  cell  is  the  identification  number  of 
the  PSU  and  the  second  figure  is  the  number  of  as- 
signments in  that  PSU.  The  columns  headed  total 
number  of  assignments,  SMA,  and  non-SMA  are 
utilized  later  in  the  allocation  process. 

Step  III 

Spacing  interviewer  assignments  throughout 
the  quarter.— The  next  step  is  to  distribute  the  num- 
ber  of  assignments  by  week  throughout  the  quarter 
in  such  a fashion  that  each  interviewer's  work  is 
spaced  at  approximately  even  intervals  over  the 
quarter  and  so  that  the  total  number  of  assignments 
is  roughly  constant  from  week  to  week.  This  step 
is  carried  out  on  another  worksheet  shown  in  table  6. 


Week  number 


1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

G 

G 

G 

F 

G 

G 

G 

F 

G 

G 

G 

F 

G 

F 

F 

D 

D 

F 

F 

D 

X 

F 

F 

D 

X 

D 

D 

X 

Y 

X 

D 

X 

Y 

Y 

D 

X 

Y 

A 

X 

Y 

A 

J 

A 

Y 

A 

J 

A 

J 

Y 

J 

B 

C 

B 

B 

The  interviewer  with  the  largest  number  of 
assignments-*-Interviewer  G with  10  assignments 
in  this  section— has  her  assignments  located  by 
week  on  the  first  line  of  the  table.  Since  she  has 
work  in  10  of  the  13  weeks,  she  has  an  assignment 
in  each  week  except  for  3 evenly  spaced  and  ran- 
domly chosen  weeks.  Note  that  at  this  point  the 
identity  of  each  assignment  has  not  been  deter- 
mined, but  only  the  fact  that  Interviewer  G has  an 
assignment  in  the  specified  week. 

Then  the  interviewer  with  the  next  largest 
number  of  assignments — Interviewer  F with  9 as- 
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signments— has  her  weeks  of  work  posted  to  table 
6.  This  is  done  by  entering  her  identification  in  9 
of  the  remaining  unfilled  cells  of  the  table,  taking 
care  to  fill  the  first  line  before  starting  on  the  sec- 
ond line  and  still  attempting  equal  spacing  of  the  9 
assignments.  In  particular,  F is  not  allowed  to  have 

2 assignments  in  the  same  week.  This  process 
is  continued  for  each  interviewer  until  the  54  as- 
signments for  the  section  have  been  placed.  The 
assignment  of  Interviewer  C to  week  13  was  made 
with  consideration  being  given  also  to  timing  of  her 
assignments  in  the  neighboring  section. 

Interviewers  Eand  Beach  have 7 assignments, 
In  table  6 designations  X and  Y have  been  used  in 
lieu  of  Eand  H without  decision  as  to  which  is  which. 
This  decision  is  reserved  to  a later  point  in  order 
to  permit  greater  flexibility  in  placing  work. 

Step  IV 

Randomizing  assignments.  —The  remaining 
problem  is  to  match  specific  assignments  randomly 
with  weekly  allocations  of  workload  for  the  inter- 
viewers. An  important  side  condition  is  imposed 
on  this  process. 

As  nearly  as  possible  each  week's  sample  is 
kept  balanced  by  SMA  assignments  and  non-SMA 
assignments.  In  this  example,  with  54  assignments 
to  be  made  during  the  quarter,  either  1 or  2 SMA 
assignments  will  be  made  each  week,  either  2 or 

3 non-SMA  assignments,  and  a total  of  4 or  5 as- 
signments each  week. 

Assignments  first  are  made  tentatively,  and 
in  a few  instances  it  may  become  necessary  for  an 
assignment  which  has  been  allocated  to  one  inter- 
viewer to  be  reassigned  later  in  the  process  to 
another  interviewer  as  the  sequential  assignment 
process  reduces  degrees  of  freedom  in  allocating 
workloads.  Before  beginning  the  randomization  of 
assignments  one  needs  to  assemble  the  data  from 
tables  5 and  6 and  from  a new  table— table  7. 


Table  7.  Designation  of  PSU's  and  assign- 
ments as  SMA  and  non-SMA 


Assignments  in 

Assignments  in 

these  PSU's 

these  PSU's 

are  SMA  segments 

are  non-SMA  segments 

1,  3,  6,  12, 

2,  4,  5,  7,  8,  9,  10, 

15,  16,  19 

11,  13,  14,  17,  18,  20 

Assignments  within  a PSU  are  then  identified 
by  a letter  prefixed  by  a PSU  number;  e.g.,  the  3 


assignments  in  PSU  Number  5 are  5a,  5b,  and  5c. 

Table  8 reflects  the  final  allocation  and  ran- 
domization of  assignments.  The  designation  in  the 
cell  indicates  the  interviewer  and  the  specific  as- 
signment to  her  in  that  week.  Procedure  for  filling 
in  the  table  is  outlined  in  the  remaining  paragraphs 
of  this  Appendix. 

The  initial  determination  is  number  of  SMA  as- 
signments for  each  week.  As  noted  earlier,  this  must 
be  either  1 or  2 for  each  week.  Which  weeks  get  2 
is  determined  randomly,  except  that  weeks  4 and  8, 
which  are  to  have  a total  of  5 assignments,  are 
given  2 SMA  assignments  each.  This  action  deter- 
mines also  the  number  of  non-SMA  assignments 
for  each  week  and  these  are  posted  to  table  8. 

Allocation  is  made  first  then,  for  the  SMA  as- 
signment for  week  1.  Table  6 shows  that  interview- 
ers G,  F,  D,  and  Y are  scheduled  to  work  in  the 
first  week  and  the  single  SMA  allotment  could  be 
given  to  any  one  of  the  SMA  assignments  associated 
with  these  interviewers.  Interviewer  Y is  not  yet 
identified  as  to  whether  she  is  E or  H.  Collectively, 
G,  F,  D,  E,  and  H account  for  11  SMA  assignments. 
One  of  these  is  picked  at  random.  The  assignment 
picked  was  6b,  which  also  selects  interviewer  D. 
The  entry  D6b  is  posted  in  the  first  cell  in  week  1. 

Two  SMA  assignments  are  required  for  the 
second  week,  to  be  given  interviewers  G,  F,  X,  or 
A.  The  assignments  are  next  selected  randomly 
from  the  SMA  assignments  available,  as  in  week  1. 
The  assignments  proved  to  be  Alb  and  F 12c.  This 
process  is  continued  for  successive  weeks. 

In  drawing  for  week  6,  assignment  16b  was 
selected,  and  thus  X was  determined  to  be  H,  and 
Y to  be  E. 

It  happened  that  when  week  12  was  reached 
only  SMA  assignments  Ala,  B3b,  and  B3c  remained 
available.  Since  B could  not  handle  2 assignments 
in  week  13,  Ala  was  assigned  to  week  13,  along  with 
B3b,  which  was  drawn  at  random  from  B3band  B3c. 
The  remaining  assignment  B3c,  went  to  week  12. 

When  the  SMA  assignments  had  been  allocated, 
the  non-SMA  allocations  were  undertaken,  begin- 
ning with  week  1,  and  using  the  same  procedure  as 
for  SMA  assignments. 

The  drawings  were  such  that  in  the  eighth  week 
a non-SMA  assignment  would  have  been  allotted  to 
interviewer  B.  However,  there  was  none  available 
to  B who  had  been  given  all  her  assignments  ear- 
lier— she  served  only  SMA  territory.  Since  she 
had  served  in  lieu  of  G,  D,  or  H in  week  13  for  SMA 
assignment,  a random  non-SMA  assignment  from 
among  those  still  available  to  G and  D was  sub- 
stituted for  B in  week  8.  It  turned  out  to  be  D7b. 
Two  other  similar  changes  had  to  be  made  to  com- 
plete the  panel. 


Table  8.  Final  assignments 


Week  number 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

D6b 

F12c 

J19b 

F12b 

G15c 

Hl6b 

D6a 

F12a 

D6c 

F12a 

G15a 

B3c 

Ala 

G13a 

Alb 

G14b 

B3a 

E8a 

G13d 

J19a 

H16a 

G13b 

G14c 

J20b 

FlOa 

B3b 

FlOc 

G13c 

D5a 

A2a 

D7a 

FlOd 

E9b 

E8b 

FlOb 

H17a 

D5b 

H17c 

Fllb 

E9c 

H18a 

E8c 

H17b 

Flla 

A2c 

G14a 

A 2b 

J20a 

E9a 

H18b 

E9d 

C4a 

D5c 

D7b 

All 

weeks 

Total 

SMA 

As  s ign 

1 

2 

1 

2 

1 

1 

2 

2 

1 

1 

1 

1 

2 

18 

Total 

Non- 

SMA 

Assign 

3 

2 

3 

3 

3 

3 

2 

3 

3 

3 

3 

3 

2 

36 

Total 

Assign 

4 

4 

4 

5 

4 

4 

4 

5 

4 

4 

4 

4 

4 

54 
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APPENDIX  VIII 

SELECTED  STATISTICS  ABOUT  THE  SURVEY 


For  ready  reference,  and  for  their  value  in 
giving  quick  insight  to  various  features  of  the  health 
survey,  there  are  assembled  in  this  Appendix  sev- 
eral tables  of  statistics  on  the  survey  (tables  9-15). 
In  most  instances  the  figures  which  are  shown  are 


rounded  and  approximate  since  they  are  intended 
to  convey  an  impression  rather  than  to  serve  any 
operational  purpose.  As  a result  detailed  figures 
are  not  always  consistent  with  totals. 


Table  11 . Size  of  sample  over  1 year 


Table  9.  Summary  statistics  on 
components  of  NHS 


Item 

Number 

Counties  and  independent 

3,100 

c i t i e s ---- - --- - ---- - ------- - 

Primary  sampling  units  in 

population 

1,900 

Primary  sampling  units  in 

372 

sample 

372 

Strata 

In  national  sample  in  1 year: 

Persons 

115,000 

Households 

36,000 

Segments 

6,000 

Tab  Areas 

41 

Large  SMA's  which  are  sepa- 

rate  Tab  Areas 

8 

Geographic  sections 

11 

Table  10.  Size  of  national  sample  for 
different  time  intervals 


Type  of 
unit 

Number  of  units  in 

1 

year 

1 

quarter 

1 

week 

Per  sons-  — — — — — 

115,000 

36,000 

6,000 

372 

29,000 

9,000 

1,500 

372 

2,200 
700 
115 
about  60 

Households 

Segments--  — — — 

PQTT  1 c ______ 

ijU  S— ——————— 

Type  of 
unit 

Number 

of  units  in 
over  1 year 

sample 

National 

total 

Each  geo- 
graphic 
section 

Each  Tab 
Area 

Persons 

House- 

115,000 

10,500  1 

2,800 

holds — 

36,000 

3,30c1 

880 

Segments-- 
PSU's 

6,000 

550  1 

145 

372 

341 

(2) 

^Average . 

2 

Urban  and  rural  Tab  Areas  in  a given  sample  are 
represented  by  the  same  PSU.  There  is  an  average 
of  about  18  different  sample  PSU* s for  each  of  the 
non-SMA,  first-stage  selections  for  Tab  Areas. 


Table  12.  Approximate  over-all  sampling 
rates  on  an  annual  basis 


Sector 

Approximate 
inflation  factor 
(reciprocal  of  over- 
all sampling  rate) 

U.  S.  total 

1,400 

New  York  SMA 

4,700 

Chicago  SMA 

2,000 

Typical  other  large 

SMA 

1,000 

Tab  Area  with  high- 

est sampling  rate-- 

350 

Tab  Area  with  lowest 

sampling  rate  (NY)- 

4,700 
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Table  13.  Data  on  field  supervisors  and 
interviewers 


Item 

Amount  or 
number 

Number  of  field  super- 

visors 

17 

Number  of  interviewers 

120 

Typical  interviewer  work- 

load  in  1 week 

12  households 

Typical  interviewer  work- 

load  over  1 quarter 

72  households 

Typical  number  of  inter- 

viewers in  a geographic 

section 

9 

Typical  time  required  for 

interview  of  a household, 

including  travel  and  call 

backs  (but  exclusive  of 

supplemental  inquiries)-- 

60  minutes 

Table  14.  Summary  operations  report  on  in- 
terviewing for  6 months'  activity 


Item 


Number 
or  per- 
cent1 


Number  of  listings  assigned  for 

interview 

Number  of  listings  demolished, 
vacant,  or  otherwise  not  eli- 
gible for  interview  (Types  B 

and  C exclusions) 

Net  number  of  listings  eli- 
gible for  interview 


24,032 


3,251 

20,781 


Noninterviews * 

Percent  of  listings  eligible 

Percent  refusal 

Percent  other  (not  at  home, etc.) 


1,271 

6.1 

1.2 

4.9 


Number  of  households  with  com- 
pleted interviews 


19,510 


Number  of  persons  in  households 
with  completed  interviews 


62,046 


Includes  approximately  7.5  percent  more  house- 
holds than  were  designed  for  the  basic  survey;  ex- 
tra households  used  in  preparing  estimates  for  one 
part  of  the  country. 


Table  15.  Primary  sampling  units  by  type 


Number  of 

PSU's 

Geographic  area 

Total1 

Self- 

repre- 

sent- 

ing1 

Nonself- 

repre- 

sent- 

ing 

Total 

372 

110 

262 

Boston  SMA 

1 

1 

_ 

New  York  SMA 

1 

1 

■--r 

Philadelphia  SMA- 

1 

1 

-i', 

Pittsburgh  SMA 

1 

1 

- 

Detroit  SMA 

1 

1 

- 

Chicago  SMA 

1 

1 

Los  Angeles  SMA-- 

1 

1 

San  Francisco  SMA 
Other  SMA's 

1 

1 

Northeast  Region 
North  Central 

28 

21 

7 

Region 

37 

27 

10 

South  Region 

46 

36 

10 

West  Region 

Other  non- SMA 
PSU's 

15 

14 

1 

Northeast  Region 
North  Central 

34 

9 

25 

Region 

70 

1 

69 

South  Region 

107 

1 

106 

West  Region 

36 

2 

34 

1In  det ai 1 9 se 1 f - represent ing  PSU’s  cross  section 
lines  and  are  counted  twice. 
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